22 Search Results for "Leser, Ulf"


Document
Integrating HPC, AI, and Workflows for Scientific Data Analysis (Dagstuhl Seminar 23352)

Authors: Rosa M. Badia, Laure Berti-Equille, Rafael Ferreira da Silva, and Ulf Leser

Published in: Dagstuhl Reports, Volume 13, Issue 8 (2024)


Abstract
The Dagstuhl Seminar 23352, titled "Integrating HPC, AI, and Workflows for Scientific Data Analysis," held from August 27 to September 1, 2023, was a significant event focusing on the synergy between High-Performance Computing (HPC), Artificial Intelligence (AI), and scientific workflow technologies. The seminar recognized that modern Big Data analysis in science rests on three pillars: workflow technologies for reproducibility and steering, AI and Machine Learning (ML) for versatile analysis, and HPC for handling large data sets. These elements, while crucial, have traditionally been researched separately, leading to gaps in their integration. The seminar aimed to bridge these gaps, acknowledging the challenges and opportunities at the intersection of these technologies. The event highlighted the complex interplay between HPC, workflows, and ML, noting how ML has increasingly been integrated into scientific workflows, thereby enhancing resource demands and bringing new requirements to HPC architectures, like support for GPUs and iterative computations. The seminar also addressed the challenges in adapting HPC for large-scale ML tasks, including in areas like deep learning, and the need for workflow systems to evolve to leverage ML in data analysis fully. Moreover, the seminar explored how ML could optimize scientific workflow systems and HPC operations, such as through improved scheduling and fault tolerance. A key focus was on identifying prestigious use cases of ML in HPC and understanding their unique, unmet requirements. The stochastic nature of ML and its impact on the reproducibility of data analysis on HPC systems was also a topic of discussion.

Cite as

Rosa M. Badia, Laure Berti-Equille, Rafael Ferreira da Silva, and Ulf Leser. Integrating HPC, AI, and Workflows for Scientific Data Analysis (Dagstuhl Seminar 23352). In Dagstuhl Reports, Volume 13, Issue 8, pp. 129-164, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@Article{badia_et_al:DagRep.13.8.129,
  author =	{Badia, Rosa M. and Berti-Equille, Laure and da Silva, Rafael Ferreira and Leser, Ulf},
  title =	{{Integrating HPC, AI, and Workflows for Scientific Data Analysis (Dagstuhl Seminar 23352)}},
  pages =	{129--164},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2024},
  volume =	{13},
  number =	{8},
  editor =	{Badia, Rosa M. and Berti-Equille, Laure and da Silva, Rafael Ferreira and Leser, Ulf},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.13.8.129},
  URN =		{urn:nbn:de:0030-drops-198162},
  doi =		{10.4230/DagRep.13.8.129},
  annote =	{Keywords: Large scale data presentation and analysis, Exascale class machine optimization, Performance data analysis and root cause detection, High dimensional data representation}
}
Document
Facilitating the development of controlled vocabularies for metabolomics technologies with text mining

Authors: Irena Spasic, Daniel Schober, Susanna-Assunta Sansone, Dietrich Rebholz-Schuhmann, Douglas B. Kell, and Norman W. Paton

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
Background. Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually. Results. We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts. Conclusions. We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods.

Cite as

Irena Spasic, Daniel Schober, Susanna-Assunta Sansone, Dietrich Rebholz-Schuhmann, Douglas B. Kell, and Norman W. Paton. Facilitating the development of controlled vocabularies for metabolomics technologies with text mining. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{spasic_et_al:DagSemProc.08131.5,
  author =	{Spasic, Irena and Schober, Daniel and Sansone, Susanna-Assunta and Rebholz-Schuhmann, Dietrich and Kell, Douglas B. and Paton, Norman W.},
  title =	{{Facilitating the development of controlled vocabularies for metabolomics technologies with text mining}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.5},
  URN =		{urn:nbn:de:0030-drops-15503},
  doi =		{10.4230/DagSemProc.08131.5},
  annote =	{Keywords: Text mining, ontology, controlled vocabulary, metabolomics}
}
Document
Mining associations and roles: role of feature extraction

Authors: Goran Nenadic

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
One of the ultimate aims of biomedical text mining would be to extract both explicit and implicit associations between different types of entities. In addition, assigning roles that entities have or may have in biological processes is also of interest. In this talk I will be discussing our experience in selecting and engineering textual features that can help in mining associations and roles from literature. Depending on tasks and entities involved, we have used four types of features: from simple words and terms, to words and semantic classes, to textual contexts, to contexts augmented with additional background attributes. The main epilogue is that both NLP- and domain-knowledge driven feature engineering are needed for successful mining of associations and roles.

Cite as

Goran Nenadic. Mining associations and roles: role of feature extraction. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{nenadic:DagSemProc.08131.7,
  author =	{Nenadic, Goran},
  title =	{{Mining associations and roles: role of feature extraction}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.7},
  URN =		{urn:nbn:de:0030-drops-15497},
  doi =		{10.4230/DagSemProc.08131.7},
  annote =	{Keywords: Text mining, associations, roles, feature engineering, feature extraction}
}
Document
08131 Executive Summary – Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives

Authors: Michael Ashburner, Ulf Leser, and Dietrich Rebholz-Schuhmann

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
Researchers in Text Mining and researchers active in developing ontological resources provide solutions to preserve semantic information properly, i.e. in ontologies and/or fact databases. Researchers from both fields tend to work independently from each other, but there is a shared interest to profit from ongoing research in the complementary domain. The relatedness of both domains has led to the idea to organize a workshop that brings together members of both research domains.

Cite as

Michael Ashburner, Ulf Leser, and Dietrich Rebholz-Schuhmann. 08131 Executive Summary – Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, pp. 1-5, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{ashburner_et_al:DagSemProc.08131.1,
  author =	{Ashburner, Michael and Leser, Ulf and Rebholz-Schuhmann, Dietrich},
  title =	{{08131 Executive Summary – Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--5},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.1},
  URN =		{urn:nbn:de:0030-drops-15234},
  doi =		{10.4230/DagSemProc.08131.1},
  annote =	{Keywords: Text Mining, natural language processing, ontologies, ontology design, machine learning, bioinformatics, medical informatics, knowledge management}
}
Document
Applications of semantic similarity measures

Authors: Andreas Schlicker, Fidel Ramírez, Jörg Rahnenführer, Carola Huthmacher, Alejandro Pironti, Francisco S. Domingues, Thomas Lengauer, and Mario Albrecht

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
There has been much interest in uncovering protein-protein interactions and their underlying domain-domain interactions. Many experimental techniques have been developed, for example yeast-two-hybrid screening and tandem affinity purification. Since it is time consuming and expensive to perform exhaustive experimental screens, in silico methods are used for predicting interactions. However, all experimental and computational methods have considerable false positive and false negative rates. Therefore, it is necessary to validate experimentally determined and predicted interactions. One possibility for the validation of interactions is the comparison of the functions of the proteins or domains. Gene Ontology (GO) is widely accepted as a standard vocabulary for functional terms, and is used for annotating proteins and protein families with biological processes and their molecular functions. This annotation can be used for a functional comparison of interacting proteins or domains using semantic similarity measures. Another application of semantic similarity measures is the prioritization of disease genes. It is know that functionally similar proteins are often involved in the same or similar diseases. Therefore, functional similarity is used for predicting disease associations of proteins. In the first part of my talk, I will introduce some semantic and functional similarity measures that can be used for comparison of GO terms and proteins or protein families. Then, I will show their application for determining a confidence threshold for domain-domain interaction predictions. Additionally, I will present FunSimMat (http://www.funsimmat.de/), a comprehensive resource of functional similarity values available on the web. In the last part, I will introduce the problem of comparing diseases, and a first attempt to apply functional similarity measures based on GO to this problem.

Cite as

Andreas Schlicker, Fidel Ramírez, Jörg Rahnenführer, Carola Huthmacher, Alejandro Pironti, Francisco S. Domingues, Thomas Lengauer, and Mario Albrecht. Applications of semantic similarity measures. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{schlicker_et_al:DagSemProc.08131.2,
  author =	{Schlicker, Andreas and Ram{\'\i}rez, Fidel and Rahnenf\"{u}hrer, J\"{o}rg and Huthmacher, Carola and Pironti, Alejandro and Domingues, Francisco S. and Lengauer, Thomas and Albrecht, Mario},
  title =	{{Applications of semantic similarity measures}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.2},
  URN =		{urn:nbn:de:0030-drops-15198},
  doi =		{10.4230/DagSemProc.08131.2},
  annote =	{Keywords: Semantic similarity, functional similarity, Gene Ontology, domain-domain interactions}
}
Document
Bootstrapping an interactive information extraction system for FlyBase curation

Authors: Ted Briscoe, Caroline Gasperin, Ian Lewin, and Andreas Vlachos

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
We describe an adaptive information extraction (IE) system designed to aid the curation of papers about fruit fly genomics for incorporation into FlyBase. FlyBase employs a team of about eight curators who fill in prespecified IE templetes (called proformas) for each gene and allele discussed in a given paper with curatable information associated with it. The normal approach to curation is to load the PDF of the paper into a tool such as Acroread and to use the `Find' function to search for repeated mentions of an entity of interest. The relevant information is then typed into the appropriate template fields. Templates are then checked for consistency and automatically integrated into the database. We have developed PaperBrowser, a tool designed to make it easier for curators to locate relevant information. The tool takes the PDF version of the paper as input and rerenders it as SciXML, a standard developed at Cambridge for representing the logical structure of scientific articles in a fashion amenable to text mining. The basic SciXML is augmented by a gene name recogniser and anaphora resolution module so that PaperBrowser is able to highlight gene names in the paper and to provide a navigation bar which allows the curator to jump to specific mentions of a given gene in the various sections of the paper. Alternatively, the curator can select a specific gene mention and the browser will highlight all the noun phrases which are anaphorically linked to that gene mention. These anaphoric links can either be coreferential, or associative to the gene's products or components, such as proteins or RNA. User-based evaluation of PaperBrowser in comparison to the use of Acroread, with FlyBase curators undertaking the task of finding the set of genes and alleles for which templates should be constructed, has demonstrated that curation is 20\% faster at no cost to accuracy when using PaperBrowser. PaperBrowser uses a conditional random field model to perform gene name recognition bootstrapped from training data derived automatically via information in FlyBase. The anaphora resolution algorithm is unsupervised but uses information from the Sequence Ontology augmented with lexemes from UMLS to identify noun phrases referring to gene products and components. The PDF extraction tool uses a commercial OCR package augmented with a seed-based machine learning technique to learn the mapping from font and format information to the logical structure of the paper. Papers describing the complete processing pipeline, intrinsic evaluation of the individual components and user-based experiments, along with test datasets are available from the FlySlip Project website

Cite as

Ted Briscoe, Caroline Gasperin, Ian Lewin, and Andreas Vlachos. Bootstrapping an interactive information extraction system for FlyBase curation. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{briscoe_et_al:DagSemProc.08131.3,
  author =	{Briscoe, Ted and Gasperin, Caroline and Lewin, Ian and Vlachos, Andreas},
  title =	{{Bootstrapping an interactive information extraction system for FlyBase curation}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.3},
  URN =		{urn:nbn:de:0030-drops-15086},
  doi =		{10.4230/DagSemProc.08131.3},
  annote =	{Keywords: Biomedical Text Mining, Interactive Information Extraction, Natural Language Processing}
}
Document
Coreference Resolution in Biomedical Texts: a Machine Learning Approach

Authors: Jian Su, Xiaofeng Yang, Huaqing Hong, Yuka Tateisi, and Jun'ichi Tsujii

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
Motivation: Coreference resolution, the process of identifying different mentions of an entity, is a very important component in a text-mining system. Compared with the work in news articles, the existing study of coreference resolution in biomedical texts is quite preliminary by only focusing on specific types of anaphors like pronouns or definite noun phrases, using heuristic methods, and running on small data sets. Therefore, there is a need for an in-depth exploration of this task in the biomedical domain. Results: In this article, we presented a learning-based approach to coreference resolution in the biomedical domain. We made three contributions in our study. Firstly, we annotated a large scale coreference corpus, MedCo, which consists of 1,999 medline abstracts in the GENIA data set. Secondly, we proposed a detailed framework for the coreference resolution task, in which we augmented the traditional learning model by incorporating non-anaphors into training. Lastly, we explored various sources of knowledge for coreference resolution, particularly, those that can deal with the complexity of biomedical texts. The evaluation on the MedCo corpus showed promising results. Our coreference resolution system achieved a high precision of 85.2% with a reasonable recall of 65.3%, obtaining an F-measure of 73.9%. The results also suggested that our augmented learning model significantly boosted precision (up to 24.0%) without much loss in recall (less than 5%), and brought a gain of over 8% in F-measure.

Cite as

Jian Su, Xiaofeng Yang, Huaqing Hong, Yuka Tateisi, and Jun'ichi Tsujii. Coreference Resolution in Biomedical Texts: a Machine Learning Approach. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{su_et_al:DagSemProc.08131.4,
  author =	{Su, Jian and Yang, Xiaofeng and Hong, Huaqing and Tateisi, Yuka and Tsujii, Jun'ichi},
  title =	{{Coreference Resolution in Biomedical Texts: a Machine Learning Approach}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.4},
  URN =		{urn:nbn:de:0030-drops-15220},
  doi =		{10.4230/DagSemProc.08131.4},
  annote =	{Keywords: Coreference resolution, biomedical text}
}
Document
GoPubMed: Exploring Pubmed with Ontological Background Knowledge

Authors: Heiko Dietze, Dimitra Alexopoulou, Michael R. Alvers, Bill Barrio-Alvers, Andreas Doms, Jörg Hakenberg, Jan Mönnich, Conrad Plake, Andreas Reischuck, Loic Royer, Thomas Wächter, Matthias Zschunke, and Michael Schroeder

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
With the ever increasing size of scientific literature, finding relevant documents and answering questions has become even more of a challenge. Recently, ontologies - hierarchical, controlled vocabularies - have been introduced to annotate genomic data. They can also improve the question answering and the selection of relevant documents in the literature search. Search engines such as GoPubMed.org use ontological background knowledge to give an overview over large query results and to help answering questions. We review the problems and solutions underlying these next generation intelligent search engines and give examples of the power of this new search paradigm.

Cite as

Heiko Dietze, Dimitra Alexopoulou, Michael R. Alvers, Bill Barrio-Alvers, Andreas Doms, Jörg Hakenberg, Jan Mönnich, Conrad Plake, Andreas Reischuk, Loic Royer, Thomas Wächter, Matthias Zschunke, and Michael Schroeder. GoPubMed: Exploring Pubmed with Ontological Background Knowledge. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{dietze_et_al:DagSemProc.08131.6,
  author =	{Dietze, Heiko and Alexopoulou, Dimitra and Alvers, Michael R. and Barrio-Alvers, Bill and Doms, Andreas and Hakenberg, J\"{o}rg and M\"{o}nnich, Jan and Plake, Conrad and Reischuck, Andreas and Royer, Loic and W\"{a}chter, Thomas and Zschunke, Matthias and Schroeder, Michael},
  title =	{{GoPubMed: Exploring Pubmed with Ontological Background Knowledge}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.6},
  URN =		{urn:nbn:de:0030-drops-15204},
  doi =		{10.4230/DagSemProc.08131.6},
  annote =	{Keywords: Text mining, literature search, Gene Ontology, NLP, ontology, thesaurus, PubMed}
}
Document
Mining Phenotypes for Protein Function Prediction

Authors: Ulf Leser, Philip Groth, Bertram Weiss, and Hans-Dieter Pohlenz

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
Until very recently, phenotypes only very rarely were studied in a systematic manner. While ontologies for describing gene functions now have a 10 year long tradition, similar vocabularies for describing the phenotype of genes are only emerging now; similarly, the techniques for determining phenotypes on a large scale (especially RNAi) are available only for a few years, while genomic sequencing or gene expression studies are already established for a much longer time. In this talk, we describe results from a study for exploiting phenotype descriptions for protein function prediction. We used the data from PhenomicsDB, a phenotype database integrated from several publicly available data sources. Due to the lack of standardization, phenotypes in PhenomicsDB can only be viewed as text (short statements, abstracts, singular terms, ...). We clustered these texts and analyzed the corresponding gene clusters in terms of their coherence in functional annotation and their interconnectedness by protein-protein-interactions. We also devised a method for using the close similarity in their phenotype descriptions to predict the function of proteins. We show that this methods yields a very good precision at acceptable coverage.

Cite as

Ulf Leser, Philip Groth, Bertram Weiss, and Hans-Dieter Pohlenz. Mining Phenotypes for Protein Function Prediction. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{leser_et_al:DagSemProc.08131.8,
  author =	{Leser, Ulf and Groth, Philip and Weiss, Bertram and Pohlenz, Hans-Dieter},
  title =	{{Mining Phenotypes for Protein Function Prediction}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.8},
  URN =		{urn:nbn:de:0030-drops-15133},
  doi =		{10.4230/DagSemProc.08131.8},
  annote =	{Keywords: Data mining, funciton prediction, bioinformatics, phenotypes, text mining}
}
Document
Named Entity or Entity Name?

Authors: Stefan Schulz

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
The expression "named entity" is very fuzzy and its definitions partly contradictory. Semantic subtleties involving the words "entity", "name" and "term" are largely ignored. Based on formal ontology a more principled typology is introduced.

Cite as

Stefan Schulz. Named Entity or Entity Name?. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{schulz:DagSemProc.08131.9,
  author =	{Schulz, Stefan},
  title =	{{Named Entity or Entity Name?}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.9},
  URN =		{urn:nbn:de:0030-drops-15214},
  doi =		{10.4230/DagSemProc.08131.9},
  annote =	{Keywords: Ontology, Named Entity Recognition}
}
Document
NLP and Phenotypes: using Ontologies to link Human Diseases to Animal Models

Authors: N. Washington, M. Gibson, C.J. Mungall, Michael Ashburner, G. Gkoutos, M. Westerfield, M. Haendel, and S. E. Lewis

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
The path to disease gene discovery in humans is often a lengthy one, but can be significantly shortened if links between human and model organism phenotypes are readily available. Collecting and storing these descriptions in a common resource, recorded with ontologies, as well as developing the tools for annotation, access, and analysis are among the goals of the National Center for Biomedical Ontology. The use of well-structured, expert-reviewed ontologies during curation allows biological data to be understandable by both humans and computers, and thereby increases the capacity for meaningful analysis. We have developed the EQ annotation model, which uses ontology terms to label and link together entities, such as anatomical structures, with the qualities describing them. Phenotypes are represented in our model using any combination of entity (such as anatomy) ontologies in combination with an ontology of qualities (PATO). Together with the model organism databases Zfin and FlyBase, we are evaluating this model, using the Phenote Annotation Tool to capture the mutant phenotypes of 200 genes known to cause human disease (from OMIM records) that have corresponding fly and zebrafish mutant phenotypes. The phenotypic data modeled in this way is available from the NCBO Open Biomedical Database (OBD), which has the same underlying annotation data model, and can currently be accessed via a computational (REST) interface for utilization by other external application or databases. This work is funded by the NIH.

Cite as

N. Washington, M. Gibson, C.J. Mungall, Michael Ashburner, G. Gkoutos, M. Westerfield, M. Haendel, and S. E. Lewis. NLP and Phenotypes: using Ontologies to link Human Diseases to Animal Models. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{washington_et_al:DagSemProc.08131.10,
  author =	{Washington, N. and Gibson, M. and Mungall, C.J. and Ashburner, Michael and Gkoutos, G. and Westerfield, M. and Haendel, M. and Lewis, S. E.},
  title =	{{NLP and Phenotypes: using Ontologies to link Human Diseases to Animal Models}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.10},
  URN =		{urn:nbn:de:0030-drops-15143},
  doi =		{10.4230/DagSemProc.08131.10},
  annote =	{Keywords: Phenotypes, ontologies, annotation}
}
Document
Ontologies & Text Mining (for Life Sciences)

Authors: Paul Buitelaar

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
The talk will address several issues in the application and development of ontologies: the selection of appropriate ontologies for a task; the population of a selected ontology through information extraction from text; the semi-automatic development or extension of an ontology; the lexicalisation of ontologies for the purpose of ontology-based information extraction from text. Each of these issues will be addressed through a particular application: the OntoSelect ontology library and search engine (http://olp.dfki.de/ontoselect/); the OntoLT Protege PlugIn for ontology learning from text (http://olp.dfki.de/OntoLT/OntoLT.htm); the SOBA system for ontology-based information extraction from text; the LingInfo lexicon model for the integration of lexical/linguistic information in ontologies (http://olp.dfki.de/LingInfo/).

Cite as

Paul Buitelaar. Ontologies & Text Mining (for Life Sciences). In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{buitelaar:DagSemProc.08131.11,
  author =	{Buitelaar, Paul},
  title =	{{Ontologies \& Text Mining (for Life Sciences)}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.11},
  URN =		{urn:nbn:de:0030-drops-15095},
  doi =		{10.4230/DagSemProc.08131.11},
  annote =	{Keywords: Ontology Search; Ontology Population; Ontology Learning; Lexical Enrichment of Ontologies}
}
Document
Ontology learning with text mining: Two use cases in lipoprotein metabolism and toxicology

Authors: Dimitra Alexopoulou, Thomas Wächter, Laura Pickersgill, Cecilia Eyre, and Michael Schroeder

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
Background: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. Results: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. Secondly we present a use case for ontology-based search for toxicological methods. Conclusions: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking automatic term recognition results as input. Availability: The automatic term recognition method is available as web service, described at http://gopubmed4.biotec.tu- dresden.de/IdavollWebService/services/CandidateTermGeneratorService?wsdl

Cite as

Dimitra Alexopoulou, Thomas Wächter, Laura Pickersgill, Cecilia Eyre, and Michael Schroeder. Ontology learning with text mining: Two use cases in lipoprotein metabolism and toxicology. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{alexopoulou_et_al:DagSemProc.08131.12,
  author =	{Alexopoulou, Dimitra and W\"{a}chter, Thomas and Pickersgill, Laura and Eyre, Cecilia and Schroeder, Michael},
  title =	{{Ontology learning with text mining: Two use cases in lipoprotein metabolism and toxicology}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.12},
  URN =		{urn:nbn:de:0030-drops-15063},
  doi =		{10.4230/DagSemProc.08131.12},
  annote =	{Keywords: Automatic Term Recognition, Ontology Learning, Lipoprotein Metabolism}
}
Document
Ontology-based Extraction of Transcription Regulation Events

Authors: Jung-Jae Kim

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
I present an on-going work on extraction of transcription regulation events from text by using an ontology which plays a central role in integrating information from different sources. The events of transcription regulation are expressed in the literature with a high degree of compositeness. They have elements such as event types, participants, and attributes. These elements are associated with different keywords, which should be merged into a shared structure. I use the Gene Regulation Ontology (GRO) for the integration purpose. It contains not only biological concepts related to transcription regulation, but also inference rules for deduction of specific event types and attributes from semantics of sentences. It is also used to represent the semantics of linguistic patterns that are used to identify the semantics of sentences. The ontology provides the formality which is required for the extraction of specific and well-defined events as those of transcription regulation.

Cite as

Jung-Jae Kim. Ontology-based Extraction of Transcription Regulation Events. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{kim:DagSemProc.08131.13,
  author =	{Kim, Jung-Jae},
  title =	{{Ontology-based Extraction of Transcription Regulation Events}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.13},
  URN =		{urn:nbn:de:0030-drops-15112},
  doi =		{10.4230/DagSemProc.08131.13},
  annote =	{Keywords: Information extraction, ontology, transcription regulation, inference, ontology semantics}
}
Document
Ontology-Based Interactive Information Extraction

Authors: David Milward

Published in: Dagstuhl Seminar Proceedings, Volume 8131, Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives (2008)


Abstract
Interactive Information Extraction brings together search and information extraction to provide fast, interactive text mining over large volumes of text such as Medline abstracts, full text scientific articles, patents etc. As well as covering the two ends of the spectrum: keyword search over documents, and detailed linguistic patterns within sentences, the Interactive Information Extraction System, I2E, also covers the points in between such as keywords within the same sentence, or co-occurrence of biological entities within sentences or documents. This talk briefly introduces the idea of Interactive Information Extraction, and describes how terminologies/ontologies are incorporated. We also show how I2E can be used to augment ontologies by finding potential synonyms or members of classes from the literature using linguistic patterns. Finally we discuss issues concerning how best to use ontologies for text mining.

Cite as

David Milward. Ontology-Based Interactive Information Extraction. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008)


Copy BibTex To Clipboard

@InProceedings{milward:DagSemProc.08131.14,
  author =	{Milward, David},
  title =	{{Ontology-Based Interactive Information Extraction}},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  pages =	{1--1},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2008},
  volume =	{8131},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.08131.14},
  URN =		{urn:nbn:de:0030-drops-15150},
  doi =		{10.4230/DagSemProc.08131.14},
  annote =	{Keywords: Information extraction, ontologies, text mining}
}
  • Refine by Author
  • 4 Leser, Ulf
  • 2 Alexopoulou, Dimitra
  • 2 Ashburner, Michael
  • 2 Hakenberg, Jörg
  • 2 Rebholz-Schuhmann, Dietrich
  • Show More...

  • Refine by Classification
  • 1 Computing methodologies → Distributed computing methodologies
  • 1 Computing methodologies → Machine learning
  • 1 Computing methodologies → Parallel computing methodologies

  • Refine by Keyword
  • 4 ontologies
  • 3 Gene Ontology
  • 3 Text mining
  • 3 ontology
  • 2 Data mining
  • Show More...

  • Refine by Type
  • 22 document

  • Refine by Publication Year
  • 20 2008
  • 1 2005
  • 1 2024

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail