Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus

Authors Basil Ell , Mohammad Fazleh Elahi , Philipp Cimiano



PDF
Thumbnail PDF

File

OASIcs.LDK.2021.33.pdf
  • Filesize: 0.76 MB
  • 21 pages

Document Identifiers

Author Details

Basil Ell
  • CIT-EC, University of Bielefeld, Germany
  • Department of Informatics, University of Oslo, Norway
Mohammad Fazleh Elahi
  • CIT-EC, University of Bielefeld, Germany
Philipp Cimiano
  • CIT-EC, University of Bielefeld, Germany

Cite As Get BibTex

Basil Ell, Mohammad Fazleh Elahi, and Philipp Cimiano. Bridging the Gap Between Ontology and Lexicon via Class-Specific Association Rules Mined from a Loosely-Parallel Text-Data Corpus. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 33:1-33:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/OASIcs.LDK.2021.33

Abstract

There is a well-known lexical gap between content expressed in the form of natural language (NL) texts and content stored in an RDF knowledge base (KB). For tasks such as Information Extraction (IE), this gap needs to be bridged from NL to KB, so that facts extracted from text can be represented in RDF and can then be added to an RDF KB. For tasks such as Natural Language Generation, this gap needs to be bridged from KB to NL, so that facts stored in an RDF KB can be verbalized and read by humans. In this paper we propose LexExMachina, a new methodology that induces correspondences between lexical elements and KB elements by mining class-specific association rules. As an example of such an association rule, consider the rule that predicts that if the text about a person contains the token "Greek", then this person has the relation nationality to the entity Greece. Another rule predicts that if the text about a settlement contains the token "Greek", then this settlement has the relation country to the entity Greece. Such a rule can help in question answering, as it maps an adjective to the relevant KB terms, and it can help in information extraction from text. We propose and empirically investigate a set of 20 types of class-specific association rules together with different interestingness measures to rank them. We apply our method on a loosely-parallel text-data corpus that consists of data from DBpedia and texts from Wikipedia, and evaluate and provide empirical evidence for the utility of the rules for Question Answering.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Information extraction
  • Computing methodologies → Natural language generation
Keywords
  • Ontology
  • Lexicon
  • Association Rules
  • Pattern Mining

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. DBpedia: A Nucleus for a Web of Open Data. In The semantic web, pages 722-735. Springer, 2007. Google Scholar
  2. Sergey Brin, Rajeev Motwani, and Craig Silverstein. Beyond Market Baskets: Generalizing Association Rules to Correlations. In Proceedings of the 1997 ACM SIGMOD international conference on Management of data, pages 265-276, 1997. Google Scholar
  3. Philipp Cimiano, Janna Lüker, David Nagel, and Christina Unger. Exploiting ontology lexica for generating natural language texts from RDF data. In Proceedings of the 14th European Workshop on Natural Language Generation, pages 10-19, 2013. Google Scholar
  4. Jiwei Ding, Wei Hu, Qixin Xu, and Yuzhong Qu. Mapping Factoid Adjective Constraints to Existential Restrictions over Knowledge Bases. In ISWC, pages 164-181. Springer, 2019. Google Scholar
  5. Basil Ell and Andreas Harth. A language-independent method for the extraction of rdf verbalization templates. In Proceedings of the 8th international natural language generation conference (INLG), pages 26-34, 2014. Google Scholar
  6. Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. The WebNLG challenge: Generating text from RDF data. In INLG, pages 124-133, 2017. Google Scholar
  7. Daniel Gerber and A-C Ngonga Ngomo. Bootstrapping the Linked Data Web. In 1st Workshop on Web Scale Knowledge Extraction@ ISWC, volume 2011, 2011. Google Scholar
  8. Nicolas Heist and Heiko Paulheim. Language-Agnostic Relation Extraction from Wikipedia Abstracts. In The Semantic Web - ISWC 2017, pages 383-399, 2017. Google Scholar
  9. John McCrae, Dennis Spohr, and Philipp Cimiano. Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. In ESWC, pages 245-259. Springer, 2011. Google Scholar
  10. Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP 2009, pages 1003-1011, 2009. Google Scholar
  11. Diego Moussallem, Dwaraknath Gnaneshwar, Thiago Castro Ferreira, and Axel-Cyrille Ngonga Ngomo. NABU-Multilingual Graph-Based Neural RDF Verbalizer. In ISWC, pages 420-437, 2020. Google Scholar
  12. Diego Moussallem, René Speck, and Axel-Cyrille Ngonga Ngomo. Generating Explanations in Natural Language from Knowledge Graphs. In Knowledge Graphs for eXplainable Artificial Intelligence: Foundations, Applications and Challenges, volume 47 of Studies on the Semantic Web, pages 213-241. IOS Press, 2020. Google Scholar
  13. Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. PATTY: A Taxonomy of Relational Patterns with Semantic Types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1135-1145, 2012. Google Scholar
  14. Xiantang Sun and Chris Mellish. An experiment on “free generation” from single rdf triples. In Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07), pages 105-108, 2007. Google Scholar
  15. Sebastian Walter, Christina Unger, and Philipp Cimiano. M-ATOLL: A Framework for the Lexicalization of Ontologies in Multiple Languages. In ISWC, pages 472-486. Springer, 2014. Google Scholar
  16. Tianyi Wu, Yuguo Chen, and Jiawei Han. Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining and Knowledge Discovery, 21(3):371-397, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail