Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications

Authors Christian Jilek, Markus Schröder, Rudolf Novik, Sven Schwarz, Heiko Maus, Andreas Dengel



PDF
Thumbnail PDF

File

OASIcs.LDK.2019.11.pdf
  • Filesize: 1.42 MB
  • 14 pages

Document Identifiers

Author Details

Christian Jilek
  • German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
  • Department of Computer Science, TU Kaiserslautern, Germany
Markus Schröder
  • German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
  • Department of Computer Science, TU Kaiserslautern, Germany
Rudolf Novik
  • German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Sven Schwarz
  • German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Heiko Maus
  • German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Andreas Dengel
  • German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
  • Department of Computer Science, TU Kaiserslautern, Germany

Acknowledgements

We thank Sven Hertling, Jörn Hees, Erfan Shamabadi, Oleksii Kotvytskyi and Tim Sprengart for their contributions in this project’s early and late phase, respectively.

Cite As Get BibTex

Christian Jilek, Markus Schröder, Rudolf Novik, Sven Schwarz, Heiko Maus, and Andreas Dengel. Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/OASIcs.LDK.2019.11

Abstract

A growing number of applications users daily interact with have to operate in (near) real-time: chatbots, digital companions, knowledge work support systems - just to name a few. To perform the services desired by the user, these systems have to analyze user activity logs or explicit user input extremely fast. In particular, text content (e.g. in form of text snippets) needs to be processed in an information extraction task. Regarding the aforementioned temporal requirements, this has to be accomplished in just a few milliseconds, which limits the number of methods that can be applied. Practically, only very fast methods remain, which on the other hand deliver worse results than slower but more sophisticated Natural Language Processing (NLP) pipelines.
In this paper, we investigate and propose methods for real-time capable Named Entity Recognition (NER). As a first improvement step, we address word variations induced by inflection, for example present in the German language. Our approach is ontology-based and makes use of several language information sources like Wiktionary. We evaluated it using the German Wikipedia (about 9.4B characters), for which the whole NER process took considerably less than an hour. Since precision and recall are higher than with comparably fast methods, we conclude that the quality gap between high speed methods and sophisticated NLP pipelines can be narrowed a bit more without losing real-time capable runtime performance.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Information extraction
  • Computing methodologies → Semantic networks
Keywords
  • Ontology-based information extraction
  • Named entity recognition
  • Inflectional languages
  • Real-time systems

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Steven Abney. Partial parsing via finite-state cascades. Natural Language Engineering, 2(4):337-344, 1996. Google Scholar
  2. Alfred V. Aho and Margaret J. Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333-340, 1975. Google Scholar
  3. Harith Al-Jumaily, Paloma Martínez, José L. Martínez-Fernández, and Erik Van der Goot. A real time Named Entity Recognition system for Arabic text mining. Language Resources and Evaluation, 46(4):543-563, 2012. Google Scholar
  4. Rami Al-Rfou and Steven Skiena. SpeedRead: A Fast Named Entity Recognition Pipeline. Proceedings 24th International Conference on Computational Linguistics (COLING 2012), pages 51-66, 2012. Google Scholar
  5. Jörg Caumanns. A fast and simple stemming algorithm for German words. Technical Report TR B 99-16, Center für Digitale Systeme, Freie Universität Berlin, 1999. Google Scholar
  6. Hamish Cunningham, Valentin Tablan, Angus Roberts, and Kalina Bontcheva. Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS computational biology, 9(2):e1002854, 2013. Google Scholar
  7. Arindam Dey and Bipul Syam Prukayastha. Named Entity Recognition using Gazetteer Method and N-gram Technique for an Inflectional Language: A Hybrid Approach. International Journal of Computer Applications, 84(9), 2013. Google Scholar
  8. Stefan Dlugolinskỳ, Giang Nguyen, Michal Laclavík, and Martin Šeleng. Character gazetteer for Named Entity Recognition with linear matching complexity. In 3rd World Congress on Information and Communication Technologies (WICT), pages 361-365. IEEE, 2013. Google Scholar
  9. Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora. In Proceedings of the 14th Conference on Computational Linguistics - Vol. 2, pages 539-545. Association for Computational Linguistics, 1992. Google Scholar
  10. Christian Jilek, Markus Schröder, Sven Schwarz, Heiko Maus, and Andreas Dengel. Context Spaces as the Cornerstone of a Near-Transparent and Self-Reorganizing Semantic Desktop. In The Semantic Web: ESWC 2018 Satellite Events, pages 89-94. Springer, 2018. Google Scholar
  11. Pablo N. Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems (I-Semantics), pages 1-8. ACM, 2011. Google Scholar
  12. Giang Nguyen, Štefan Dlugolinskỳ, Michal Laclavík, Martin Šeleng, and Viet Tran. Next Improvement Towards Linear Named Entity Recognition Using Character Gazetteers. In Advanced Computational Methods for Knowledge Engineering, pages 255-265. Springer, 2014. Google Scholar
  13. Jakob Nielsen. Usability Engineering. Morgan Kaufmann, 1993. Google Scholar
  14. Peter Norvig. English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU, 2013. accessed: 2018-08-18. URL: http://norvig.com/mayzner.html.
  15. Leo Sauermann, Ansgar Bernardi, and Andreas Dengel. Overview and Outlook on the Semantic Desktop. In Proceedings of the 1st Workshop on the Semantic Desktop at the ISWC 2005 Conference, pages 74-91. CEUR-WS, 2005. Google Scholar
  16. Leo Sauermann, Ludger van Elst, and Andreas Dengel. PIMO - a framework for representing personal information models. In Proceedings of I-Media '07 and I-Semantics '07, pages 270-277. Know-Center, Austria, 2007. Google Scholar
  17. Agata Savary and Jakub Piskorski. Lexicons and grammars for named entity annotation in the National corpus of Polish. In 18th International Conference Intelligent Information Systems, pages 141-154, 2010. Google Scholar
  18. Torsten Zesch, Christof Müller, and Iryna Gurevych. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC), pages 1646-1652, 2008. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail