Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications

Jilek, Christian; Schröder, Markus; Novik, Rudolf; Schwarz, Sven; Maus, Heiko; Dengel, Andreas

doi:10.4230/OASIcs.LDK.2019.11

File

OASIcs.LDK.2019.11.pdf

Filesize: 1.42 MB
14 pages

Document Identifiers

DOI: 10.4230/OASIcs.LDK.2019.11
URN: urn:nbn:de:0030-drops-103759

Author Details

Christian Jilek

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Department of Computer Science, TU Kaiserslautern, Germany

Markus Schröder

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Department of Computer Science, TU Kaiserslautern, Germany

Rudolf Novik

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany

Sven Schwarz

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany

Heiko Maus

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany

Andreas Dengel

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Department of Computer Science, TU Kaiserslautern, Germany

Acknowledgements

We thank Sven Hertling, Jörn Hees, Erfan Shamabadi, Oleksii Kotvytskyi and Tim Sprengart for their contributions in this project’s early and late phase, respectively.

Cite AsGet BibTex

Christian Jilek, Markus Schröder, Rudolf Novik, Sven Schwarz, Heiko Maus, and Andreas Dengel. Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.LDK.2019.11

@InProceedings{jilek_et_al:OASIcs.LDK.2019.11,
  author =	{Jilek, Christian and Schr\"{o}der, Markus and Novik, Rudolf and Schwarz, Sven and Maus, Heiko and Dengel, Andreas},
  title =	{{Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.11},
  URN =		{urn:nbn:de:0030-drops-103759},
  doi =		{10.4230/OASIcs.LDK.2019.11},
  annote =	{Keywords: Ontology-based information extraction, Named entity recognition, Inflectional languages, Real-time systems}
}

@InProceedings{jilek_et_al:OASIcs.LDK.2019.11,
  author =	{Jilek, Christian and Schr\"{o}der, Markus and Novik, Rudolf and Schwarz, Sven and Maus, Heiko and Dengel, Andreas},
  title =	{{Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.11},
  URN =		{urn:nbn:de:0030-drops-103759},
  doi =		{10.4230/OASIcs.LDK.2019.11},
  annote =	{Keywords: Ontology-based information extraction, Named entity recognition, Inflectional languages, Real-time systems}
}

Abstract

A growing number of applications users daily interact with have to operate in (near) real-time: chatbots, digital companions, knowledge work support systems - just to name a few. To perform the services desired by the user, these systems have to analyze user activity logs or explicit user input extremely fast. In particular, text content (e.g. in form of text snippets) needs to be processed in an information extraction task. Regarding the aforementioned temporal requirements, this has to be accomplished in just a few milliseconds, which limits the number of methods that can be applied. Practically, only very fast methods remain, which on the other hand deliver worse results than slower but more sophisticated Natural Language Processing (NLP) pipelines. In this paper, we investigate and propose methods for real-time capable Named Entity Recognition (NER). As a first improvement step, we address word variations induced by inflection, for example present in the German language. Our approach is ontology-based and makes use of several language information sources like Wiktionary. We evaluated it using the German Wikipedia (about 9.4B characters), for which the whole NER process took considerably less than an hour. Since precision and recall are higher than with comparably fast methods, we conclude that the quality gap between high speed methods and sophisticated NLP pipelines can be narrowed a bit more without losing real-time capable runtime performance.

Subject Classification

ACM Subject Classification

Computing methodologies → Information extraction
Computing methodologies → Semantic networks

Keywords

Ontology-based information extraction
Named entity recognition
Inflectional languages
Real-time systems

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Steven Abney. Partial parsing via finite-state cascades. Natural Language Engineering, 2(4):337-344, 1996.
Alfred V. Aho and Margaret J. Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333-340, 1975.
Harith Al-Jumaily, Paloma Martínez, José L. Martínez-Fernández, and Erik Van der Goot. A real time Named Entity Recognition system for Arabic text mining. Language Resources and Evaluation, 46(4):543-563, 2012.
Rami Al-Rfou and Steven Skiena. SpeedRead: A Fast Named Entity Recognition Pipeline. Proceedings 24th International Conference on Computational Linguistics (COLING 2012), pages 51-66, 2012.
Jörg Caumanns. A fast and simple stemming algorithm for German words. Technical Report TR B 99-16, Center für Digitale Systeme, Freie Universität Berlin, 1999.
Hamish Cunningham, Valentin Tablan, Angus Roberts, and Kalina Bontcheva. Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS computational biology, 9(2):e1002854, 2013.
Arindam Dey and Bipul Syam Prukayastha. Named Entity Recognition using Gazetteer Method and N-gram Technique for an Inflectional Language: A Hybrid Approach. International Journal of Computer Applications, 84(9), 2013.
Stefan Dlugolinskỳ, Giang Nguyen, Michal Laclavík, and Martin Šeleng. Character gazetteer for Named Entity Recognition with linear matching complexity. In 3rd World Congress on Information and Communication Technologies (WICT), pages 361-365. IEEE, 2013.
Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. In Proceedings of the 14th Conference on Computational Linguistics - Vol. 2, pages 539-545. Association for Computational Linguistics, 1992.
Christian Jilek, Markus Schröder, Sven Schwarz, Heiko Maus, and Andreas Dengel. Context Spaces as the Cornerstone of a Near-Transparent and Self-Reorganizing Semantic Desktop. In The Semantic Web: ESWC 2018 Satellite Events, pages 89-94. Springer, 2018.
Pablo N. Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems (I-Semantics), pages 1-8. ACM, 2011.
Giang Nguyen, Štefan Dlugolinskỳ, Michal Laclavík, Martin Šeleng, and Viet Tran. Next Improvement Towards Linear Named Entity Recognition Using Character Gazetteers. In Advanced Computational Methods for Knowledge Engineering, pages 255-265. Springer, 2014.
Jakob Nielsen. Usability Engineering. Morgan Kaufmann, 1993.
Peter Norvig. English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU, 2013. accessed: 2018-08-18. URL: http://norvig.com/mayzner.html.
Leo Sauermann, Ansgar Bernardi, and Andreas Dengel. Overview and Outlook on the Semantic Desktop. In Proceedings of the 1st Workshop on the Semantic Desktop at the ISWC 2005 Conference, pages 74-91. CEUR-WS, 2005.
Leo Sauermann, Ludger van Elst, and Andreas Dengel. PIMO - a framework for representing personal information models. In Proceedings of I-Media '07 and I-Semantics '07, pages 270-277. Know-Center, Austria, 2007.
Agata Savary and Jakub Piskorski. Lexicons and grammars for named entity annotation in the National corpus of Polish. In 18th International Conference Intelligent Information Systems, pages 141-154, 2010.
Torsten Zesch, Christof Müller, and Iryna Gurevych. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pages 1646-1652, 2008.

Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications

Authors Christian Jilek, Markus Schröder, Rudolf Novik, Sven Schwarz, Heiko Maus, Andreas Dengel

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications

Authors Christian Jilek, Markus Schröder, Rudolf Novik, Sven Schwarz, Heiko Maus, Andreas Dengel

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message