Achieving High Quality Knowledge Acquisition using Controlled Natural Language

Gao, Tiantian

doi:10.4230/OASIcs.ICLP.2017.13

Abstract

Controlled Natural Languages (CNLs) are efficient languages for knowledge acquisition and reasoning. They are designed as a subset of natural languages with restricted grammar while being highly expressive. CNLs are designed to be automatically translated into logical representations, which can be fed into rule engines for query and reasoning. In this work, we build a knowledge acquisition machine, called KAM, that extends Attempto Controlled English (ACE) and achieves three goals. First, KAM can identify CNL sentences that correspond to the same logical representation but expressed in various syntactical forms. Second, KAM provides a graphical user interface (GUI) that allows users to disambiguate the knowledge acquired from text and incorporates user feedback to improve knowledge acquisition quality. Third, KAM uses a paraconsistent logical framework to encode CNL sentences in order to achieve reasoning in the presence of inconsistent knowledge.

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary G. Ives. Dbpedia: A nucleus for a web of open data. In Karl Aberer, Key-Sun Choi, Natasha Fridman Noy, Dean Allemang, Kyung-Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux, editors, The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007., volume 4825 of Lecture Notes in Computer Science, pages 722-735. Springer, 2007.
Satanjeev Banerjee and Ted Pedersen. An adapted lesk algorithm for word sense disambiguation using wordnet. In Alexander F. Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Third International Conference, CICLing 2002, Mexico City, Mexico, February 17-23, 2002, Proceedings, volume 2276 of Lecture Notes in Computer Science, pages 136-145. Springer, 2002.
José Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artif. Intell., 240:36-64, 2016.
Dipanjan Das, Desai Chen, André F. T. Martins, Nathan Schneider, and Noah A. Smith. Frame-semantic parsing. Computational Linguistics, 40(1):9-56, 2014.
Broes de Cat, Bart Bogaerts, Maurice Bruynooghe, and Marc Denecker. Predicate logic as a modelling language: The IDP system. CoRR, abs/1401.6312, 2014. URL: http://arxiv.org/abs/1401.6312.
Esra Erdem, Halit Erdogan, and Umut Öztok. BIOQUERY-ASP: querying biomedical ontologies using answer set programming. In Stefano Bragaglia, Carlos Viegas Damásio, Marco Montali, Alun D. Preece, Charles J. Petrie, Mark Proctor, and Umberto Straccia, editors, Proceedings of the 5th International RuleML2011@BRF Challenge, co-located with the 5th International Rule Symposium, Fort Lauderdale, Florida, USA, November 3-5, 2011, volume 799 of CEUR Workshop Proceedings. CEUR-WS.org, 2011.
Norbert E. Fuchs, Kaarel Kaljurand, and Tobias Kuhn. Attempto controlled english for knowledge representation. In Cristina Baroglio, Piero A. Bonatti, Jan Maluszynski, Massimo Marchiori, Axel Polleres, and Sebastian Schaffert, editors, Reasoning Web, 4th International Summer School 2008, Venice, Italy, September 7-11, 2008, Tutorial Lectures, volume 5224 of Lecture Notes in Computer Science, pages 104-124. Springer, 2008.
Tiantian Gao, Paul Fodor, and Michael Kifer. Paraconsistency and word puzzles. TPLP, 16(5-6):703-720, 2016.
M. Gebser, R. Kaminski, B. Kaufmann, and T. Schaub. Clingo = ASP + control: Preliminary report. In M. Leuschel and T. Schrijvers, editors, Technical Communications of the Thirtieth International Conference on Logic Programming (ICLP'14), volume arXiv:1405.3694v1, 2014. Theory and Practice of Logic Programming, Online Supplement.
Graeme Hirst, David St-Onge, et al. Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: An electronic lexical database, 305:305-332, 1998.
Jay J. Jiang and David W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. CoRR, cmp-lg/9709008, 1997.
Richard Johansson and Pierre Nugues. Lth: Semantic structure extraction using nonprojective dependency trees. In Proceedings of the 4th International Workshop on Semantic Evaluations, SemEval '07, pages 227-230, Stroudsburg, PA, USA, 2007. Association for Computational Linguistics.
Chrstopher R. Johnson, Charles J. Fillmore, Miriam R.L. Petruck, Collin F. Baker, Michael J. Ellsworth, Josef Ruppenhofer, and Esther J. Wood. FrameNet: Theory and Practice, 2002.
Paul Kingsbury and Martha Palmer. Propbank: the next level of treebank. In Proceedings of Treebanks and lexical Theories, volume 3. Citeseer, 2003.
Claudia Leacock and Martin Chodorow. Combining local context and wordnet similarity for word sense identification. WordNet: An electronic lexical database, 49(2):265-283, 1998.
John Lehmann, Sean Monahan, Luke Nezda, Arnold Jung, and Ying Shi. LCC approaches to knowledge base population at TAC 2010. In Proceedings of the Third Text Analysis Conference, TAC 2010, Gaithersburg, Maryland, USA, November 15-16, 2010. NIST, 2010.
Dekang Lin. An information-theoretic definition of similarity. In Jude W. Shavlik, editor, Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, Wisconsin, USA, July 24-27, 1998, pages 296-304. Morgan Kaufmann, 1998.
Mausam, Michael Schmitz, Stephen Soderland, Robert Bart, and Oren Etzioni. Open language learning for information extraction. In Jun'ichi Tsujii, James Henderson, and Marius Pasca, editors, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, July 12-14, 2012, Jeju Island, Korea, pages 523-534. ACL, 2012.
George A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39-41, 1995.
Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. Align, disambiguate and walk: A unified approach for measuring semantic similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4-9 August 2013, Sofia, Bulgaria, Volume 1: Long Papers, pages 1341-1351. The Association for Computer Linguistics, 2013.
Philip Resnik. Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007, 1995.
Konstantinos Sagonas, Terrance Swift, and David S. Warren. Xsb as an efficient deductive database engine. In In Proceedings of the ACM SIGMOD International Conference on the Management of Data, pages 442-453. ACM Press, 1994.
Karin Kipper Schuler. Verbnet: A Broad-coverage, Comprehensive Verb Lexicon. PhD thesis, University of Pennsylvania, Philadelphia, PA, USA, 2005. AAI3179808.
Rolf Schwitter. English as a formal specification language. In 13th International Workshop on Database and Expert Systems Applications (DEXA 2002), 2-6 September 2002, Aix-en-Provence, France, pages 228-232. IEEE Computer Society, 2002.
Fabian M Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, pages 697-706. ACM, 2007.
Mihai Surdeanu, David McClosky, Mason R. Smith, Andrey Gusev, and Christopher D. Manning. Customizing an information extraction system to a new domain. In Proceedings of the ACL 2011 Workshop on Relational Models of Semantics, RELMS '11, pages 2-10, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.
Denny Vrandečić and Markus Krötzsch. Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10):78-85, 2014.
Zhibiao Wu and Martha Stone Palmer. Verb semantics and lexical selection. In James Pustejovsky, editor, 32nd Annual Meeting of the Association for Computational Linguistics, 27-30 June 1994, New Mexico State University, Las Cruces, New Mexico, USA, Proceedings., pages 133-138. Morgan Kaufmann Publishers / ACL, 1994.

Achieving High Quality Knowledge Acquisition using Controlled Natural Language

Author Tiantian Gao

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References