Textpresso - an Information Retrieval and Extraction System for Biological Literature

Authors Hans-Michael Mueller, Arun Rangarajan, Tracy K. Teal, Kimberly van Auken, Juancarlos Chan, Paul W. Sternberg



PDF
Thumbnail PDF

File

DagSemProc.08131.19.pdf
  • Filesize: 157 kB
  • 1 pages

Document Identifiers

Author Details

Hans-Michael Mueller
Arun Rangarajan
Tracy K. Teal
Kimberly van Auken
Juancarlos Chan
Paul W. Sternberg

Cite As Get BibTex

Hans-Michael Mueller, Arun Rangarajan, Tracy K. Teal, Kimberly van Auken, Juancarlos Chan, and Paul W. Sternberg. Textpresso - an Information Retrieval and Extraction System for Biological Literature. In Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives. Dagstuhl Seminar Proceedings, Volume 8131, p. 1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2008) https://doi.org/10.4230/DagSemProc.08131.19

Abstract

We  developed  an  information  retrieval  and  extraction  system that  processes  the  full 
text  of  biological  papers.  The  system,  called  Textpresso,  separates  text  into 
sentences, labels words and phrases according to an ontology (an organized lexicon), 
and allows queries to be performed on a database of labeled sentences.  The current 
ontology comprises approximately one hundred categories of  terms,  such as "gene",  
"regulation", "human disease", "brain area" etc.,  and  also  contains  main  Gene 
Ontology (GO) categories. Extraction of particular biological facts, such as gene-­gene 
interactions,  or  the  curation  of  GO  cellular  components,  can  be  accelerated 
significantly by ontologies, with Textpresso automatically performing nearly as well as 
expert  curators  to  identify  sentences.  Search  engine  for  four  literatures,  C.  elegans,  
Drosophila, Arabidopsis and Neuroscience have been established by us, and thirteen 
systems for  other literatures have been developed by other groups around the world. 
Currently,  our  four  systems  contain  112,000  papers  with  40  million  sentences,  all 
systems worldwide contain 190,000 papers with approximately 65 million sentences.

Subject Classification

Keywords
  • Information retrieval
  • literature search engine
  • information extraction
  • automated literature curation
  • semantic search
  • ontology,

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail