Textpresso - an Information Retrieval and Extraction System for Biological Literature

Mueller, Hans-Michael; Rangarajan, Arun; Teal, Tracy K.; van Auken, Kimberly; Chan, Juancarlos; Sternberg, Paul W.

doi:10.4230/DagSemProc.08131.19

File

Subject Classification

Keywords

Information retrieval
literature search engine
information extraction
automated literature curation
semantic search
ontology,

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

We developed an information retrieval and extraction system that processes the full text of biological papers. The system, called Textpresso, separates text into sentences, labels words and phrases according to an ontology (an organized lexicon), and allows queries to be performed on a database of labeled sentences. The current ontology comprises approximately one hundred categories of terms, such as "gene", "regulation", "human disease", "brain area" etc., and also contains main Gene Ontology (GO) categories. Extraction of particular biological facts, such as gene-Ã‚Âgene interactions, or the curation of GO cellular components, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences. Search engine for four literatures, C. elegans, Drosophila, Arabidopsis and Neuroscience have been established by us, and thirteen systems for other literatures have been developed by other groups around the world. Currently, our four systems contain 112,000 papers with 40 million sentences, all systems worldwide contain 190,000 papers with approximately 65 million sentences.

Cite As Get BibTex

Hans-Michael Mueller, Arun Rangarajan, Tracy K. Teal, Kimberly van Auken, Juancarlos Chan, and Paul W. Sternberg. Textpresso - an Information Retrieval and Extraction System for Biological Literature. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008) https://doi.org/10.4230/DagSemProc.08131.19

Author Details

Hans-Michael Mueller

Arun Rangarajan

Tracy K. Teal

Kimberly van Auken

Juancarlos Chan

Paul W. Sternberg

Textpresso - an Information Retrieval and Extraction System for Biological Literature

Authors Hans-Michael Mueller, Arun Rangarajan, Tracy K. Teal, Kimberly van Auken, Juancarlos Chan, Paul W. Sternberg

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Thanks for your feedback!

Could not send message