Document Open Access Logo

Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review

Authors Maraim Masoud , Bianca Pereira , John McCrae , Paul Buitelaar

Thumbnail PDF


  • Filesize: 0.55 MB
  • 9 pages

Document Identifiers

Author Details

Maraim Masoud
  • Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Ireland
Bianca Pereira
  • Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Ireland
John McCrae
  • Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Ireland
Paul Buitelaar
  • Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, Ireland

Cite AsGet BibTex

Maraim Masoud, Bianca Pereira, John McCrae, and Paul Buitelaar. Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 19:1-19:9, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Knowledge graphs have been shown to be an important data structure for many applications, including chatbot development, data integration, and semantic search. In the enterprise domain, such graphs need to be constructed based on both structured (e.g. databases) and unstructured (e.g. textual) internal data sources; preferentially using automatic approaches due to the costs associated with manual construction of knowledge graphs. However, despite the growing body of research that leverages both structured and textual data sources in the context of automatic knowledge graph construction, the research community has centered on either one type of source or the other. In this paper, we conduct a preliminary literature review to investigate approaches that can be used for the integration of textual and structured data sources in the process of automatic knowledge graph construction. We highlight the solutions currently available for use within enterprises and point areas that would benefit from further research.

Subject Classification

ACM Subject Classification
  • Information systems → Information extraction
  • Knowledge Graph Construction
  • Enterprise Knowledge Graph


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Alessio Palmero Aprosio, Claudio Giuliano, and Alberto Lavelli. Automatic expansion of dbpedia exploiting wikipedia cross-language information. In Extended Semantic Web Conference, pages 397-411. Springer, 2013. Google Scholar
  2. Georgeta Bordea, Els Lefever, and Paul Buitelaar. SemEval-2016 task 13: Taxonomy extraction evaluation (TExEval-2). In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pages 1081-1091. ACL, 2016. Google Scholar
  3. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In NeurIPS, pages 1-9, 2013. Google Scholar
  4. Volha Bryl and Christian Bizer. Learning conflict resolution strategies for cross-language wikipedia data fusion. In Proceedings of the 23rd International Conference on World Wide Web, pages 1129-1134, 2014. Google Scholar
  5. Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 30(9):1616-1637, 2018. Google Scholar
  6. Muhao Chen, Yingtao Tian, Kai-Wei Chang, Steven Skiena, and Carlo Zaniolo. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In IJCAI, pages 3998-4004, 2018. Google Scholar
  7. Muhao Chen, Yingtao Tian, Mohan Yang, and Carlo Zaniolo. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In IJCAI, pages 1511-1517, 2017. Google Scholar
  8. Gustavo de Assis Costa and José Maria Parente de Oliveira. Linguistic frames as support for entity alignment in knowledge graphs. In Proceedings of the 20th International Conference on Information Integration and Web-based Applications & Services, pages 226-229, 2018. Google Scholar
  9. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601-610, 2014. Google Scholar
  10. Arnab Dutta, Christian Meilicke, and Simone Paolo Ponzetto. A probabilistic approach for integrating heterogeneous knowledge sources. In European Semantic Web Conference, pages 286-301. Springer, 2014. Google Scholar
  11. Jérôme Euzenat, Christian Meilicke, Heiner Stuckenschmidt, Pavel Shvaiko, and Cássia Trojahn. Ontology alignment evaluation initiative: Six years of experience. In Stefano Spaccapietra, editor, Journal on Data Semantics XV, pages 158-192. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011. Google Scholar
  12. Jun Feng, Minlie Huang, Yang Yang, and Xiaoyan Zhu. Gake: Graph aware knowledge embedding. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 641-651, 2016. Google Scholar
  13. Shu Guo, Quan Wang, Bin Wang, Lihong Wang, and Li Guo. Semantically smooth knowledge graph embedding. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 84-94, 2015. Google Scholar
  14. Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd annual meeting of the ACL and the 7th international joint conference on natural language processing (volume 1: Long papers), pages 687-696, 2015. Google Scholar
  15. Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. Knowledge graph completion with adaptive sparse transfer matrix. In Dale Schuurmans and Michael P. Wellman, editors, AAAI, pages 985-991. AAAI Press, 2016. Google Scholar
  16. Seyed Mehran Kazemi and David Poole. Simple embedding for link prediction in knowledge graphs. In NeurIPS, pages 4289-4300, 2018. Google Scholar
  17. Chao Kong, Ming Gao, Chen Xu, Yunbin Fu, Weining Qian, and Aoying Zhou. EnAli: entity alignment across multiple heterogeneous data sources. Frontiers of Computer Science, 13(1):157-169, 2019. Google Scholar
  18. Jens Lehmann, Daniel Gerber, Mohamed Morsey, and Axel-Cyrille Ngonga Ngomo. Defacto-deep fact validation. In International semantic web conference, pages 312-327. Springer, 2012. Google Scholar
  19. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Blai Bonet and Sven Koenig, editors, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 2181-2187. AAAI Press, 2015. Google Scholar
  20. Shuangyan Liu, M. d'Aquin, and E. Motta. Towards linked data fact validation through measuring consensus. In LDQ@ESWC, 2015. Google Scholar
  21. Shuangyan Liu, Mathieu d’Aquin, and Enrico Motta. Measuring accuracy of triples in knowledge graphs. In LDK, pages 343-357. Springer, 2017. Google Scholar
  22. Yuanfei Luo, Quan Wang, Bin Wang, and Li Guo. Context-dependent knowledge graph embedding. In Proceedings of the 2015 Conference on EMNLP, pages 1656-1661, 2015. Google Scholar
  23. Paul McNamee and Hoa Trang Dang. Overview of the tac 2009 knowledge base population track. In Text Analysis Conference (TAC), volume 17, pages 111-113, 2009. Google Scholar
  24. Emir Muñoz, Aidan Hogan, and Alessandra Mileo. Triplifying wikipedia’s tables. LD4IE@ ISWC, 2013. Google Scholar
  25. Hoang Long Nguyen, Dang Thinh Vu, and Jason J Jung. Knowledge graph fusion for smart systems: A survey. Information Fusion, 61:56-70, 2020. Google Scholar
  26. Heiko Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3):489-508, 2017. Google Scholar
  27. Heiko Paulheim and Simone Paolo Ponzetto. Extending dbpedia with wikipedia list pages. NLP-DBPEDIA@ ISWC, 13, 2013. Google Scholar
  28. Boya Peng, Yejin Huh, Xiao Ling, and Michele Banko. Improving knowledge base construction from robust infobox extraction. In Proceedings of NAACL HLT Conference, Volume 2 (Industry Papers), pages 138-148, 2019. Google Scholar
  29. Bianca Pereira, C. Robin, Tobias Daudert, John P. McCrae, Pranab Mohanty, and P. Buitelaar. Taxonomy extraction for customer service knowledge base construction. In SEMANTiCS, 2019. Google Scholar
  30. Dominique Ritze, Oliver Lehmberg, and Christian Bizer. Matching html tables to dbpedia. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, pages 1-6, 2015. Google Scholar
  31. G. Rizzo, B. Pereira, A. Varga, M. Van Erp, and A.E.C. Basave. Lessons learnt from the Named Entity rEcognition and Linking (NEEL) challenge series. Semantic Web, 8(5):667-700, 2017. Google Scholar
  32. Jingbo Shang, Xinyang Zhang, Liyuan Liu, Sha Li, and Jiawei Han. Nettaxo: Automated topic taxonomy construction from text-rich network. In Proceedings of The Web Conference 2020, pages 1908-1919, 2020. Google Scholar
  33. Wei Shen, Jianyong Wang, and Jiawei Han. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2):443-460, 2014. Google Scholar
  34. Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. Reasoning with neural tensor networks for knowledge base completion. In NeurIPS, pages 926-934. Citeseer, 2013. Google Scholar
  35. Fabian M Suchanek, Serge Abiteboul, and Pierre Senellart. Paris: probabilistic alignment of relations, instances, and schema. Proceedings of the VLDB Endowment, 5(3):157-168, 2011. Google Scholar
  36. Zequn Sun, Wei Hu, and Chengkai Li. Cross-lingual entity alignment via joint attribute-preserving embedding. In International Semantic Web Conference, pages 628-644. Springer, 2017. Google Scholar
  37. Yi Tay, Anh Tuan Luu, Siu Cheung Hui, and Falk Brauer. Random semantic tensor ensemble for scalable knowledge graph link prediction. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 751-760, 2017. Google Scholar
  38. Lucy Wang, Chandra Bhagavatula, Mark Neumann, Kyle Lo, Chris Wilhelm, and Waleed Ammar. Ontology alignment in the biomedical domain using entity definitions and context. In Proceedings of the BioNLP 2018 workshop, pages 47-55, 2018. Google Scholar
  39. Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Carla E. Brodley and Peter Stone, editors, AAAI, pages 1112-1119, 2014. Google Scholar
  40. Zhigang Wang, Juanzi Li, Zhiyuan Liu, and Jie Tang. Text-enhanced representation learning for knowledge graph. In IJCAI, pages 4-17, 2016. Google Scholar
  41. Ruobing Xie, Zhiyuan Liu, and Maosong Sun. Representation learning of knowledge graphs with hierarchical types. In IJCAI, pages 2965-2971, 2016. Google Scholar
  42. Xiaoxin Yin and Sarthak Shah. Building taxonomy of web search intents for name entity queries. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, page 1001–1010. ACM, 2010. Google Scholar
  43. Youmin Zhang, Li Liu, Shun Fu, and Fujin Zhong. Entity alignment across knowledge graphs based on representative relations selection. In ICSAI, pages 1056-1061. IEEE, 2018. Google Scholar
  44. Yuchen Zhang, Amr Ahmed, Vanja Josifovski, and Alexander Smola. Taxonomy discovery for personalized recommendation. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 243-252, 2014. Google Scholar
  45. Xiaojuan Zhao, Yan Jia, Aiping Li, Rong Jiang, and Yichen Song. Multi-source knowledge fusion: a survey. World Wide Web, 23(4):2567-2592, 2020. Google Scholar
  46. Hao Zhu, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. Iterative entity alignment via joint knowledge embeddings. In IJCAI, volume 17, pages 4258-4264, 2017. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail