Search Results

Documents authored by Matos, Emanuel


Document
Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers

Authors: Emanuel Matos, Mário Rodrigues, Pedro Miguel, and António Teixeira

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)


Abstract
Named Entity Recognition (NER) is an essential step for many natural language processing tasks, including Information Extraction. Despite recent advances, particularly using deep learning techniques, the creation of accurate named entity recognizers continues a complex task, highly dependent on annotated data availability. To foster existence of NER systems for new domains it is crucial to obtain the required large volumes of annotated data with low or no manual labor. In this paper it is proposed a system to create the annotated data automatically, by resorting to a set of existing NERs and information sources (DBpedia). The approach was tested with documents of the Tourism domain. Distinct methods were applied for deciding the final named entities and respective tags. The results show that this approach can increase the confidence on annotations and/or augment the number of categories possible to annotate. This paper also presents examples of new NERs that can be rapidly created with the obtained annotated data. The annotated data, combined with the possibility to apply both the ensemble of NER systems and the new Gazetteer-based NERs to large corpora, create the necessary conditions to explore the recent neural deep learning state-of-art approaches to NER (ex: BERT) in domains with scarce or nonexistent data for training.

Cite as

Emanuel Matos, Mário Rodrigues, Pedro Miguel, and António Teixeira. Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{matos_et_al:OASIcs.SLATE.2021.11,
  author =	{Matos, Emanuel and Rodrigues, M\'{a}rio and Miguel, Pedro and Teixeira, Ant\'{o}nio},
  title =	{{Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.11},
  URN =		{urn:nbn:de:0030-drops-144286},
  doi =		{10.4230/OASIcs.SLATE.2021.11},
  annote =	{Keywords: Named Entity Recognition (NER), Automatic Annotation, Gazetteers, Tourism, Portuguese}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail