Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus

Ell, Basil; Elahi, Mohammad Fazleh; Cimiano, Philipp

doi:10.4230/OASIcs.LDK.2021.33

Abstract

There is a well-known lexical gap between content expressed in the form of natural language (NL) texts and content stored in an RDF knowledge base (KB). For tasks such as Information Extraction (IE), this gap needs to be bridged from NL to KB, so that facts extracted from text can be represented in RDF and can then be added to an RDF KB. For tasks such as Natural Language Generation, this gap needs to be bridged from KB to NL, so that facts stored in an RDF KB can be verbalized and read by humans. In this paper we propose LexExMachina, a new methodology that induces correspondences between lexical elements and KB elements by mining class-specific association rules. As an example of such an association rule, consider the rule that predicts that if the text about a person contains the token "Greek", then this person has the relation nationality to the entity Greece. Another rule predicts that if the text about a settlement contains the token "Greek", then this settlement has the relation country to the entity Greece. Such a rule can help in question answering, as it maps an adjective to the relevant KB terms, and it can help in information extraction from text. We propose and empirically investigate a set of 20 types of class-specific association rules together with different interestingness measures to rank them. We apply our method on a loosely-parallel text-data corpus that consists of data from DBpedia and texts from Wikipedia, and evaluate and provide empirical evidence for the utility of the rules for Question Answering.

Cite As Get BibTex

Basil Ell, Mohammad Fazleh Elahi, and Philipp Cimiano. Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 33:1-33:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/OASIcs.LDK.2021.33

Author Details

Basil Ell

CIT-EC, University of Bielefeld, Germany
Department of Informatics, University of Oslo, Norway

Mohammad Fazleh Elahi

CIT-EC, University of Bielefeld, Germany

Philipp Cimiano

CIT-EC, University of Bielefeld, Germany

Funding

This work has been supported by the EU’s Horizon 2020 project Prêt-à-LLOD (grant agreement No 825182) and by the SIRIUS centre: Norwegian Research Council project No 237898.

Supplementary Materials

Collection (Dataset and Source Code) http://www.LexExMachina.xyz

References

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. DBpedia: A Nucleus for a Web of Open Data. In The semantic web, pages 722-735. Springer, 2007.
Sergey Brin, Rajeev Motwani, and Craig Silverstein. Beyond Market Baskets: Generalizing Association Rules to Correlations. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pages 265-276, 1997.
Philipp Cimiano, Janna Lüker, David Nagel, and Christina Unger. Exploiting ontology lexica for generating natural language texts from RDF data. In Proceedings of the 14th European Workshop on Natural Language Generation, pages 10-19, 2013.
Jiwei Ding, Wei Hu, Qixin Xu, and Yuzhong Qu. Mapping Factoid Adjective Constraints to Existential Restrictions over Knowledge Bases. In ISWC, pages 164-181. Springer, 2019.
Basil Ell and Andreas Harth. A language-independent method for the extraction of rdf verbalization templates. In Proceedings of the 8th international natural language generation conference (INLG), pages 26-34, 2014.
Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. The WebNLG challenge: Generating text from RDF data. In INLG, pages 124-133, 2017.
Daniel Gerber and A-C Ngonga Ngomo. Bootstrapping the Linked Data Web. In 1st Workshop on Web Scale Knowledge Extraction@ ISWC, volume 2011, 2011.
Nicolas Heist and Heiko Paulheim. Language-Agnostic Relation Extraction from Wikipedia Abstracts. In The Semantic Web - ISWC 2017, pages 383-399, 2017.
John McCrae, Dennis Spohr, and Philipp Cimiano. Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In ESWC, pages 245-259. Springer, 2011.
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP 2009, pages 1003-1011, 2009.
Diego Moussallem, Dwaraknath Gnaneshwar, Thiago Castro Ferreira, and Axel-Cyrille Ngonga Ngomo. NABU-Multilingual Graph-Based Neural RDF Verbalizer. In ISWC, pages 420-437, 2020.
Diego Moussallem, René Speck, and Axel-Cyrille Ngonga Ngomo. Generating Explanations in Natural Language from Knowledge Graphs. In Knowledge Graphs for eXplainable Artificial Intelligence: Foundations, Applications and Challenges, volume 47 of Studies on the Semantic Web, pages 213-241. IOS Press, 2020.
Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. PATTY: A Taxonomy of Relational Patterns with Semantic Types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1135-1145, 2012.
Xiantang Sun and Chris Mellish. An experiment on “free generation” from single rdf triples. In Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07), pages 105-108, 2007.
Sebastian Walter, Christina Unger, and Philipp Cimiano. M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple Languages. In ISWC, pages 472-486. Springer, 2014.
Tianyi Wu, Yuguo Chen, and Jiawei Han. Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining and Knowledge Discovery, 21(3):371-397, 2010.

Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus

Authors Basil Ell , Mohammad Fazleh Elahi , Philipp Cimiano

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message