Name Variants for Improving Entity Discovery and Linking

Weichselbraun, Albert; Kuntschik, Philipp; Braşoveanu, Adrian M. P.

doi:10.4230/OASIcs.LDK.2019.14

Abstract

Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content.
This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements.

Gabor Angeli, Victor Zhong, Danqi Chen, Arun Tejasvi Chaganty, Jason Bolton, Melvin Jose Johnson Premkumar, Panupong Pasupat, Sonal Gupta, and Christopher D. Manning. Bootstrapped Self Training for Knowledge Base Population. In Proceedings of the 2015 Text Analysis Conference, TAC 2015, Gaithersburg, Maryland, USA, November 16-17, 2015, 2015. NIST, 2015. URL: https://tac.nist.gov/publications/2015/participant.papers/TAC2015.Stanford.proceedings.pdf.
Adrian M. P. Braşoveanu, Giuseppe Rizzo, Philipp Kuntschick, Albert Weichselbraun, and Lyndon J.B. Nixon. Framing Named Entity Linking Error Types. In Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, editors, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pages 266-271, Paris, France, May 2018. European Language Resources Association (ELRA). URL: http://www.lrec-conf.org/proceedings/lrec2018/summaries/612.html.
Lorenzo Canale, Pasquale Lisena, and Raphaël Troncy. A Novel Ensemble Method for Named Entity Recognition and Disambiguation Based on Neural Network. In Denny Vrandecic, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl, editors, The Semantic Web - ISWC 2018 - 17th International Semantic Web Conference, Monterey, CA, USA, October 8-12, 2018, Proceedings, Part I, volume 11136 of Lecture Notes in Computer Science, pages 91-107. Springer, 2018. URL: http://dx.doi.org/10.1007/978-3-030-00671-6_6.
Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. Improving efficiency and accuracy in multilingual entity extraction. In I-SEMANTICS 2013 - 9th International Conference on Semantic Systems, ISEM '13, Graz, Austria, September 4-6, 2013, pages 121-124. ACM, 2013. URL: http://dx.doi.org/10.1145/2506182.2506198.
Maud Ehrmann, Guillaume Jacquet, and Ralf Steinberger. JRC-Names: Multilingual entity name variants and titles as Linked Data. Semantic Web, 8(2):283-295, 2017. URL: http://dx.doi.org/10.3233/SW-160228.
Fredo Erxleben, Michael Günther, Markus Krötzsch, Julian Mendez, and Denny Vrandecic. Introducing Wikidata to the Linked Data Web. In The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, volume 8796 of Lecture Notes in Computer Science, pages 50-65. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-11964-9_4.
Roxana Girju, Adriana Badulescu, and Dan I. Moldovan. Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, Edmonton, Canada, May 27 - June 1, 2003. The Association for Computational Linguistics, 2003. URL: http://aclweb.org/anthology/N/N03/N03-1011.pdf.
Swapna Gottipati and Jing Jiang. Linking Entities to a Knowledge Base with Query Expansion. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27-31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 804-813, 2011. URL: http://www.aclweb.org/anthology/D11-1074.
Yuhang Guo, Wanxiang Che, Ting Liu, and Sheng Li. A Graph-based Method for Entity Linking. In Fifth International Joint Conference on Natural Language Processing, IJCNLP 2011, Chiang Mai, Thailand, November 8-13, 2011, pages 1010-1018. The Association for Computer Linguistics, 2011. URL: http://aclweb.org/anthology/I/I11/I11-1113.pdf.
Ben Hachey, Joel Nothman, and Will Radford. Cheap and easy entity evaluation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, June 22-27, 2014, Baltimore, MD, USA, Volume 2: Short Papers, pages 464-469. The Association for Computer Linguistics, 2014. URL: http://aclweb.org/anthology/P/P14/P14-2076.pdf.
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. Robust Disambiguation of Named Entities in Text. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27-31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 782-792, 2011. URL: http://www.aclweb.org/anthology/D11-1072.
Lifu Huang, Avirup Sil, Heng Ji, and Radu Florian. Improving Slot Filling Performance with Attentive Neural Networks on Dependency Structures. In Martha Palmer, Rebecca Hwa, and Sebastian Riedel, editors, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 2588-2597. Association for Computational Linguistics, 2017. URL: https://aclanthology.info/papers/D17-1274/d17-1274.
Filip Ilievski, Giuseppe Rizzo, Marieke van Erp, Julien Plu, and Raphaël Troncy. Context-enhanced Adaptive Entity Linking. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016. European Language Resources Association (ELRA), 2016. URL: http://www.lrec-conf.org/proceedings/lrec2016/summaries/852.html.
Heng Ji and Joel Nothman. Overview of TAC-KBP2016 Tri-lingual EDL and Its Impact on End-to-End KBP. In Eighth Text Analysis Conference (TAC). NIST, 2016.
Tomás Kliegr. Linked hypernyms: Enriching DBpedia with Targeted Hypernym Discovery. J. Web Sem., 31:59-69, 2015. URL: http://dx.doi.org/10.1016/j.websem.2014.11.001.
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6(2):167-195, 2015. URL: http://dx.doi.org/10.3233/SW-140134.
Andrea Moro, Alessandro Raganato, and Roberto Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. TACL, 2:231-244, 2014. URL: https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/291.
Diego Moussallem, Ricardo Usbeck, Michael Röder, and Axel-Cyrille Ngonga Ngomo. MAG: A multilingual, knowledge-based agnostic and deterministic entity linking approach. CoRR, abs/1707.05288, 2017. URL: http://arxiv.org/abs/1707.05288.
Andrea Giovanni Nuzzolese, Anna Lisa Gentile, Valentina Presutti, Aldo Gangemi, Darío Garigliotti, and Roberto Navigli. Open Knowledge Extraction Challenge. In Semantic Web Evaluation Challenges - Second SemWebEval Challenge at ESWC 2015, Portorož, Slovenia, May 31 - June 4, 2015, Revised Selected Papers, volume 548 of Communications in Computer and Information Science, pages 3-15. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-25518-7_1.
Fabian Odoni, Philipp Kuntschik, Adrian M. P. Braşoveanu, and Albert Weichselbraun. On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance. In Anna Fensel, Victor de Boer, Tassilo Pellegrini, Elmar Kiesling, Bernhard Haslhofer, Laura Hollink, and Alexander Schindler, editors, Proceedings of the 14th International Conference on Semantic Systems, SEMANTICS 2018, Vienna, Austria, September 10-13, 2018, volume 137 of Procedia Computer Science, pages 33-42. Elsevier, 2018. URL: http://dx.doi.org/10.1016/j.procs.2018.09.004.
Julien Plu, Giuseppe Rizzo, and Raphaël Troncy. Enhancing Entity Linking by Combining NER Models. In Semantic Web Challenges - Third SemWebEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers, volume 641 of Communications in Computer and Information Science, pages 17-32. Springer, 2016. URL: http://dx.doi.org/10.1007/978-3-319-46565-4_2.
Livy Real and Alexandre Rademaker. HAREM and Klue: how to compare two tagsets for named entities. In Proceedings of NEWS 2015 The Fifth Named Entities Workshop, page 43, 2015.
Michael Röder, Ricardo Usbeck, Sebastian Hellmann, Daniel Gerber, and Andreas Both. N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, pages 3529-3533, 2014. URL: http://www.lrec-conf.org/proceedings/lrec2014/summaries/856.html.
Benjamin Roth, Tassilo Barth, Michael Wiegand, Mittul Singh, and Dietrich Klakow. Effective Slot Filling Based on Shallow Distant Supervision Methods. arXiv:1401.1158 [cs], January 2014. URL: http://arxiv.org/abs/1401.1158.
Arno Scharl, Albert Weichselbraun, Max C. Göbel, Walter Rafelsberger, and Ruslan Kamolov. Scalable Knowledge Extraction and Visualization for Web Intelligence. In 49th Hawaii International Conference on System Sciences, HICSS 2016, Koloa, HI, USA, January 5-8, 2016, pages 3749-3757. IEEE Computer Society, 2016. URL: http://dx.doi.org/10.1109/HICSS.2016.467.
Avirup Sil and Radu Florian. One for All: Towards Language Independent Named Entity Linking. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics, 2016. URL: http://aclweb.org/anthology/P/P16/P16-1213.pdf.
Claus Stadler, Jens Lehmann, Konrad Höffner, and Sören Auer. LinkedGeoData: A core for a web of spatial open data. Semantic Web, 3(4):333-354, 2012. URL: http://dx.doi.org/10.3233/SW-2011-0052.
Zequn Sun, Wei Hu, and Chengkai Li. Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding. In The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I, volume 10587 of Lecture Notes in Computer Science, pages 628-644. Springer, 2017. URL: http://dx.doi.org/10.1007/978-3-319-68288-4_37.
Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Michael Röder, Daniel Gerber, Sandro Athaide Coelho, Sören Auer, and Andreas Both. AGDISTIS - agnostic disambiguation of named entities using linked open data. In ECAI 2014 - 21st European Conference on Artificial Intelligence, 18-22 August 2014, Prague, Czech Republic - Including Prestigious Applications of Intelligent Systems (PAIS 2014), volume 263 of Frontiers in Artificial Intelligence and Applications, pages 1113-1114. IOS Press, 2014. URL: http://dx.doi.org/10.3233/978-1-61499-419-0-1113.
Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, and Lars Wesemann. GERBIL: General entity annotator benchmarking framework. In Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pages 1133-1143, 2015. URL: http://dx.doi.org/10.1145/2736277.2741626.
Albert Weichselbraun, Philipp Kuntschik, and Adrian M. P. Braşoveanu. Mining and Leveraging Background Knowledge for Improving Named Entity Linking. In Proceedins of the 8th International Conference on Web Intelligence, Mining and Semantics (WIMS 2018), Novi Sad, Serbia, 2018. ACM. URL: http://dx.doi.org/10.1145/3227609.3227670.
Ganggao Zhu and Carlos Angel Iglesias. Sematch: Semantic Entity Search from Knowledge Graph. In Joint Proceedings of the 1st International Workshop on Summarizing and Presenting Entities and Ontologies and the 3rd International Workshop on Human Semantic Web Interfaces (SumPre 2015, HSWI 2015) co-located with the 12th Extended Semantic Web Conference (ESWC 2015), Portoroz, Slovenia, June 1, 2015., volume 1556 of CEUR Workshop Proceedings. CEUR-WS.org, 2015. URL: http://ceur-ws.org/Vol-1556/paper2.pdf.

Name Variants for Improving Entity Discovery and Linking

Authors Albert Weichselbraun, Philipp Kuntschik, Adrian M. P. Braşoveanu

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Name Variants for Improving Entity Discovery and Linking

Authors Albert Weichselbraun, Philipp Kuntschik, Adrian M. P. Braşoveanu

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message