License
When quoting this document, please refer to the following
URN: urn:nbn:de:0030-drops-15101
URL: http://drops.dagstuhl.de/opus/volltexte/2008/1510/
Go to the corresponding Portal


Hakenberg, Jörg

Services for annotation of biomedical text

pdf-format:
Document 1.pdf (70 KB)


Abstract

Motivation: Text mining in the biomedical domain in recent years has focused on the development of tools for recognizing named entities and extracting relations. Such research resulted from the need for such tools as basic components for more advanced solutions. Named entity recognition, entity mention normalization, and relationship extraction now have reached a stage where they perform comparably to human annotators (considering inter--annotator agreement, measured in many studies to be around 90\%). Many tools have been made available, through web--interfaces or as downloadable software using non--standardized formats for in-- and output. To advance progress in text mining, solutions are needed to both provide and combine the results of 'basic' information retrieval and extraction tools. Results: Our groups at Technical University Dresden, Humboldt--Universit"{a}t zu Berlin, and Arizona State University developed systems for named entity recognition, normalization, and relationship extraction. As evaluated during and after the BioCreative 2 challenge, recognition of proteins achieves 86\% f--measure, normalization of gene mentions 85\%, and extraction of protein--protein interactions including mapping to UniProt 25\%. Conclusions: We consider the BioCreative meta-service an ideal framework to make available information extraction tools to a variety of users: researchers from the biomedical domain, database curators, and researchers in text mining who can use the services as input for subsequent analyses. At the time of writing this abstract, twelve groups provide their tools as services to the BCMS server. We currently participates with tools for recognizing names of genes/proteins and species, normalization of gene mentions to EntrezGene, protein mentions to UniProt, mentions of species to NCBI Taxonomy, as well as classifying abstracts for protein-protein interactions. Availability: For more information, please refer to http://alibaba.informatik.hu-berlin.de/bcms/. BCMS is available at http://bcms.bioinfo.cnio.es/.

BibTeX - Entry

@InProceedings{hakenberg:DSP:2008:1510,
  author =	{J{\"o}rg Hakenberg},
  title =	{Services for annotation of biomedical text},
  booktitle =	{Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives},
  year =	{2008},
  editor =	{Michael Ashburner and Ulf Leser and Dietrich Rebholz-Schuhmann},
  number =	{08131},
  series =	{Dagstuhl Seminar Proceedings},
  ISSN =	{1862-4405},
  publisher =	{Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2008/1510},
  annote =	{Keywords: BioCreative, NER, EMN, GN, information extraction, web-service, AliBaba}
}

Keywords: BioCreative, NER, EMN, GN, information extraction, web-service, AliBaba
Seminar: 08131 - Ontologies and Text Mining for Life Sciences : Current Status and Future Perspectives
Issue Date: 2008
Date of publication: 03.06.2008


DROPS-Home | Fulltext Search | Imprint Published by LZI