Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review

Masoud, Maraim; Pereira, Bianca; McCrae, John; Buitelaar, Paul

doi:10.4230/OASIcs.LDK.2021.19

Abstract

Knowledge graphs have been shown to be an important data structure for many applications, including chatbot development, data integration, and semantic search. In the enterprise domain, such graphs need to be constructed based on both structured (e.g. databases) and unstructured (e.g. textual) internal data sources; preferentially using automatic approaches due to the costs associated with manual construction of knowledge graphs. However, despite the growing body of research that leverages both structured and textual data sources in the context of automatic knowledge graph construction, the research community has centered on either one type of source or the other. In this paper, we conduct a preliminary literature review to investigate approaches that can be used for the integration of textual and structured data sources in the process of automatic knowledge graph construction. We highlight the solutions currently available for use within enterprises and point areas that would benefit from further research.

Alessio Palmero Aprosio, Claudio Giuliano, and Alberto Lavelli. Automatic expansion of dbpedia exploiting wikipedia cross-language information. In Extended Semantic Web Conference, pages 397-411. Springer, 2013.
Georgeta Bordea, Els Lefever, and Paul Buitelaar. SemEval-2016 task 13: Taxonomy extraction evaluation (TExEval-2). In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pages 1081-1091. ACL, 2016.
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In NeurIPS, pages 1-9, 2013.
Volha Bryl and Christian Bizer. Learning conflict resolution strategies for cross-language wikipedia data fusion. In Proceedings of the 23rd International Conference on World Wide Web, pages 1129-1134, 2014.
Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 30(9):1616-1637, 2018.
Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In IJCAI, pages 3998-4004, 2018.
Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In IJCAI, pages 1511-1517, 2017.
Gustavo de Assis Costa and José Maria Parente de Oliveira. Linguistic frames as support for entity alignment in knowledge graphs. In Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services, pages 226-229, 2018.
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601-610, 2014.
Arnab Dutta, Christian Meilicke, and Simone Paolo Ponzetto. A probabilistic approach for integrating heterogeneous knowledge sources. In European Semantic Web Conference, pages 286-301. Springer, 2014.
Jérôme Euzenat, Christian Meilicke, Heiner Stuckenschmidt, Pavel Shvaiko, and Cássia Trojahn. Ontology alignment evaluation initiative: Six years of experience. In Stefano Spaccapietra, editor, Journal on Data Semantics XV, pages 158-192. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.
Jun Feng, Minlie Huang, Yang Yang, and Xiaoyan Zhu. Gake: Graph aware knowledge embedding. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 641-651, 2016.
Shu Guo, Quan Wang, Bin Wang, Lihong Wang, and Li Guo. Semantically smooth knowledge graph embedding. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 84-94, 2015.
Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd annual meeting of the ACL and the 7th international joint conference on natural language processing (volume 1: Long papers), pages 687-696, 2015.
Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. Knowledge graph completion with adaptive sparse transfer matrix. In Dale Schuurmans and Michael P. Wellman, editors, AAAI, pages 985-991. AAAI Press, 2016.
Seyed Mehran Kazemi and David Poole. Simple embedding for link prediction in knowledge graphs. In NeurIPS, pages 4289-4300, 2018.
Chao Kong, Ming Gao, Chen Xu, Yunbin Fu, Weining Qian, and Aoying Zhou. EnAli: entity alignment across multiple heterogeneous data sources. Frontiers of Computer Science, 13(1):157-169, 2019.
Jens Lehmann, Daniel Gerber, Mohamed Morsey, and Axel-Cyrille Ngonga Ngomo. Defacto-deep fact validation. In International semantic web conference, pages 312-327. Springer, 2012.
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Blai Bonet and Sven Koenig, editors, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 2181-2187. AAAI Press, 2015.
Shuangyan Liu, M. d'Aquin, and E. Motta. Towards linked data fact validation through measuring consensus. In LDQ@ESWC, 2015.
Shuangyan Liu, Mathieu d’Aquin, and Enrico Motta. Measuring accuracy of triples in knowledge graphs. In LDK, pages 343-357. Springer, 2017.
Yuanfei Luo, Quan Wang, Bin Wang, and Li Guo. Context-dependent knowledge graph embedding. In Proceedings of the 2015 Conference on EMNLP, pages 1656-1661, 2015.
Paul McNamee and Hoa Trang Dang. Overview of the tac 2009 knowledge base population track. In Text Analysis Conference (TAC), volume 17, pages 111-113, 2009.
Emir Muñoz, Aidan Hogan, and Alessandra Mileo. Triplifying wikipedia’s tables. LD4IE@ ISWC, 2013.
Hoang Long Nguyen, Dang Thinh Vu, and Jason J Jung. Knowledge graph fusion for smart systems: A survey. Information Fusion, 61:56-70, 2020.
Heiko Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3):489-508, 2017.
Heiko Paulheim and Simone Paolo Ponzetto. Extending dbpedia with wikipedia list pages. NLP-DBPEDIA@ ISWC, 13, 2013.
Boya Peng, Yejin Huh, Xiao Ling, and Michele Banko. Improving knowledge base construction from robust infobox extraction. In Proceedings of NAACL HLT Conference, Volume 2 (Industry Papers), pages 138-148, 2019.
Bianca Pereira, C. Robin, Tobias Daudert, John P. McCrae, Pranab Mohanty, and P. Buitelaar. Taxonomy extraction for customer service knowledge base construction. In SEMANTiCS, 2019.
Dominique Ritze, Oliver Lehmberg, and Christian Bizer. Matching html tables to dbpedia. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, pages 1-6, 2015.
G. Rizzo, B. Pereira, A. Varga, M. Van Erp, and A.E.C. Basave. Lessons learnt from the Named Entity rEcognition and Linking (NEEL) challengeÂ series. Semantic Web, 8(5):667-700, 2017.
Jingbo Shang, Xinyang Zhang, Liyuan Liu, Sha Li, and Jiawei Han. Nettaxo: Automated topic taxonomy construction from text-rich network. In Proceedings of The Web Conference 2020, pages 1908-1919, 2020.
Wei Shen, Jianyong Wang, and Jiawei Han. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2):443-460, 2014.
Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. Reasoning with neural tensor networks for knowledge base completion. In NeurIPS, pages 926-934. Citeseer, 2013.
Fabian M Suchanek, Serge Abiteboul, and Pierre Senellart. Paris: probabilistic alignment of relations, instances, and schema. Proceedings of the VLDB Endowment, 5(3):157-168, 2011.
Zequn Sun, Wei Hu, and Chengkai Li. Cross-lingual entity alignment via joint attribute-preserving embedding. In International Semantic Web Conference, pages 628-644. Springer, 2017.
Yi Tay, Anh Tuan Luu, Siu Cheung Hui, and Falk Brauer. Random semantic tensor ensemble for scalable knowledge graph link prediction. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 751-760, 2017.
Lucy Wang, Chandra Bhagavatula, Mark Neumann, Kyle Lo, Chris Wilhelm, and Waleed Ammar. Ontology alignment in the biomedical domain using entity definitions and context. In Proceedings of the BioNLP 2018 workshop, pages 47-55, 2018.
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Carla E. Brodley and Peter Stone, editors, AAAI, pages 1112-1119, 2014.
Zhigang Wang, Juanzi Li, Zhiyuan Liu, and Jie Tang. Text-enhanced representation learning for knowledge graph. In IJCAI, pages 4-17, 2016.
Ruobing Xie, Zhiyuan Liu, and Maosong Sun. Representation learning of knowledge graphs with hierarchical types. In IJCAI, pages 2965-2971, 2016.
Xiaoxin Yin and Sarthak Shah. Building taxonomy of web search intents for name entity queries. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, page 1001–1010. ACM, 2010.
Youmin Zhang, Li Liu, Shun Fu, and Fujin Zhong. Entity alignment across knowledge graphs based on representative relations selection. In ICSAI, pages 1056-1061. IEEE, 2018.
Yuchen Zhang, Amr Ahmed, Vanja Josifovski, and Alexander Smola. Taxonomy discovery for personalized recommendation. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 243-252, 2014.
Xiaojuan Zhao, Yan Jia, Aiping Li, Rong Jiang, and Yichen Song. Multi-source knowledge fusion: a survey. World Wide Web, 23(4):2567-2592, 2020.
Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. Iterative entity alignment via joint knowledge embeddings. In IJCAI, volume 17, pages 4258-4264, 2017.

Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review

Authors Maraim Masoud , Bianca Pereira , John McCrae , Paul Buitelaar

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review

Authors Maraim Masoud , Bianca Pereira , John McCrae , Paul Buitelaar

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message