Document Open Access Logo

HISTORIAE, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case

Authors Florentina Armaselu , Elena-Simona Apostol , Anas Fahad Khan , Chaya Liebeskind , Barbara McGillivray , Ciprian-Octavian Truică , Giedrė Valūnaitė Oleškevičienė

Thumbnail PDF


  • Filesize: 0.92 MB
  • 13 pages

Document Identifiers

Author Details

Florentina Armaselu
  • Centre for Contemporary and Digital History (Csuperscript2DH), University of Luxembourg, Luxembourg
Elena-Simona Apostol
  • Department of Computer Science and Engineering, Faculty of Automatic Control and Computer, University Politehnica of Bucharest, Romania
Anas Fahad Khan
  • Institute for Computational Linguistics lessless{}A. Zampolligreatergreater{}, National Research Council of Italy, Pisa, Italy
Chaya Liebeskind
  • Department of Computer Science, Jerusalem College of Technology, Israel
Barbara McGillivray
  • Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, UK
  • The Alan Turing Institute, London, UK
Ciprian-Octavian Truică
  • Department of Computer Science and Engineering, Faculty of Automatic Control and Computer, University Politehnica of Bucharest, Romania
Giedrė Valūnaitė Oleškevičienė
  • Institute of Humanities, Mykolas Romeris University, Vilnius, Lietuva

Cite AsGet BibTex

Florentina Armaselu, Elena-Simona Apostol, Anas Fahad Khan, Chaya Liebeskind, Barbara McGillivray, Ciprian-Octavian Truică, and Giedrė Valūnaitė Oleškevičienė. HISTORIAE, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 34:1-34:13, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


The paper proposes an interdisciplinary approach including methods from disciplines such as history of concepts, linguistics, natural language processing (NLP) and Semantic Web, to create a comparative framework for detecting semantic change in multilingual historical corpora and generating diachronic ontologies as linguistic linked open data (LLOD). Initiated as a use case (UC4.2.1) within the COST Action Nexus Linguarum, European network for Web-centred linguistic data science, the study will explore emerging trends in knowledge extraction, analysis and representation from linguistic data science, and apply the devised methodology to datasets in the humanities to trace the evolution of concepts from the domain of socio-cultural transformation. The paper will describe the main elements of the methodological framework and preliminary planning of the intended workflow.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Semantic networks
  • Computing methodologies → Ontology engineering
  • Computing methodologies → Temporal reasoning
  • Computing methodologies → Lexical semantics
  • Computing methodologies → Language resources
  • Computing methodologies → Information extraction
  • linguistic linked open data
  • natural language processing
  • semantic change
  • diachronic ontologies
  • digital humanities


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Alessandro Arcangeli. Cultural History. A Concise Introduction. Routledge, 1 edition, 2012. URL:
  2. Muhammad Nabeel Asim, Muhammad Wasim, Muhammad Usman Ghani Khan, Waqar Mahmood, and Hafiza Mahnoor Abbasi. A Survey of Ontology Learning Techniques and Applications. Database, 2018, January 2018. URL:
  3. Isabelle Augenstein, Sebastian Padó, and Sebastian Rudolph. LODifier: Generating Linked Data from Unstructured Text. In Elena Simperl, Philipp Cimiano, Axel Polleres, Oscar Corcho, and Valentina Presutti, editors, The Semantic Web: Research and Applications, volume 7295 of Lecture Notes in Computer Science, page 210–224. Springer Berlin Heidelberg, 2012. URL:
  4. Arianna Betti and Hein van den Berg. Modelling the History of Ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014. URL:
  5. Yuri Bizzoni, Marius Mosbach, Dietrich Klakow, and Stefania Degaetano-Ortlieb. Some Steps Towards the Generation of Diachronic WordNets. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 55-64, 2019. URL:
  6. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5:135-146, 2017. URL:
  7. P. Buitelaar, P. Cimiano, and B. Magnini. Ontology Learning from Text: An Overview. In Ontology Learning from Text: Methods, Evaluation and Applications, volume 123, pages 3-12. IOS Press, 2005. Google Scholar
  8. Philipp Cimiano, Christian Chiarcos, John P. McCrae, and Jorge Gracia. Linguistic Linked Data in Digital Humanities. In Linguistic Linked Data. Representation, Generation and Applications, pages 229-262. Springer International Publishing, 1 edition, 2020. URL:
  9. Philipp Cimiano and Johanna Volker. Text2Onto. A Framework for Ontology Learning and Data-driven Change Discovery. Natural Language Processing and Information Systems: 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005, Alicante, Spain, June 15 - 17, 2005; proceedings. Lecture Notes in Computer Science, 3513. Montoyo A, Munoz R, Metais E (Eds); Springer: 227-238., 2005. Google Scholar
  10. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics, pages 4171-4186. Association for Computational Linguistics, 2019. URL:
  11. Euthymios Drymonas, Kalliopi Zervanou, and Euripides G. M. Petrakis. Unsupervised Ontology Acquisition from Plain Texts: The OntoGain System. In Christina J. Hopfe, Yacine Rezgui, Elisabeth Métais, Alun Preece, and Haijiang Li, editors, Natural Language Processing and Information Systems, volume 6177 of Lecture Notes in Computer Science, page 277–287. Springer Berlin Heidelberg, 2010. URL:
  12. Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Ströbel, and Raphaël Barman. Language Resources for Historical Newspapers: the Impresso Collection. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association (ELRA), 2020. Google Scholar
  13. Antske Fokkens, Serge Ter Braake, Isa Maks, and Davide Ceolin. On the Semantics of Concept Drift: Towards Formal Definitions of Semantic Change. Drift-a-LOD@EKAW, 2016. Google Scholar
  14. Christian Fäth, Christian Chiarcos, Björn Ebbrecht, and Maxim Ionov. Fintan - Flexible, Integrated Transformation and Annotation eNgineering. In Proceedings of the 12th Conference on Language Resources and Evaluation, page 7212–7221. European Language Resources Association (ELRA), licensed under CC-BY-NC, May 2020. Google Scholar
  15. Dirk Geeraerts. Theories of lexical semantics. Oxford University Press, 2010. Google Scholar
  16. Jolanta Gelumbeckaite, Mindaugas Sinkunas, and Vytautas Zinkevicius. Old Lithuanian Reference Corpus (SLIEKKAS) and Automated Grammatical Annotation. J. Lang. Technol. Comput. Linguistics, 27(2):83-96, 2012. Google Scholar
  17. Mario Giulianelli, Marco Del Tredici, and Raquel Fernández. Analysing Lexical Semantic Change with Contextualised Word Representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3960-3973, Online, 2020. Association for Computational Linguistics. URL:
  18. Hongyu Gong, Suma Bhat, and Pramod Viswanath. Enriching Word Embeddings with Temporal and Spatial Information. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 1-11, Online, November 2020. Association for Computational Linguistics. URL:
  19. Jon Atle Gulla, Geir Solskinnsbakk, Per Myrseth, Veronika Haderlein, and Olga Cerrato. Semantic Drift in Ontologies. In WEBIST 2010, Proceedings of the 6th International Conference on Web Information Systems and Technologies, volume 2, April 2010. Google Scholar
  20. Shaoda He, Xiaojun Zou, Liumingjing Xiao, and Junfeng Hu. Construction of Diachronic Ontologies from People’s Daily of Fifty Years. LREC 2014 Proceedings, 2014. Google Scholar
  21. Vivek Iyer, Mohan Mohan, Y. Raghu Babu Reddy, and Mehar Bhatia. A Survey on Ontology Enrichment from Text. In The sixteenth International Conference on Natural Language Processing (ICON-2019), 2019. Google Scholar
  22. Vani Kanjirangat, Sandra Mitrovic, Alessandro Antonucci, and Fabio Rinaldi. SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-based Embedding Spaces. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 214-221, 2020. Google Scholar
  23. Anas Fahad Khan. Towards the Representation of Etymological Data on the Semantic Web. Information, 9(12), November 2018. Publisher: MDPI AG. URL:
  24. Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes. In International Conference on Learning Representations, 2014. Google Scholar
  25. Reinhart Koselleck. Some Reflections on the Temporal Structure of Conceptual Change. In Willem Melching and Velema Wyger, editors, Main Trends in Cultural History. Ten Essays, page 7–16. Rodopi, 1994. Google Scholar
  26. Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. Diachronic Word Embeddings and Semantic Shifts: A Survey. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1384-1397, Santa Fe, New Mexico, USA, 2018. Association for Computational Linguistics. Google Scholar
  27. Jouni-Matti Kuukkanen. Making Sense of Conceptual Change. History and Theory, 47(3):351-372, 2008. URL:
  28. Chaya Liebeskind and Shmuel Liebeskind. Deep Learning for Period Classification of Historical Hebrew Texts. Journal of Data Mining and Digital Humanities, 2020. Google Scholar
  29. Jianyi Liu, Yu Tian Ru Zhang, Youqiang Sun, and Chan Wang. A Two-Stage Generative Adversarial Networks With Semantic Content Constraints for Adversarial Example Generation. IEEE Access, 8:205766-205777, 2020. URL:
  30. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs], July 2019. URL:
  31. John P. McCrae, Julia Bosque-Gil, Jorge Gracia, Paul Buitelaar, and Philipp Cimiano. The OntoLex-Lemon Model: Development and Applications. Electronic Lexicography in the 21st Century. Proc. of eLex 2017 conference, in Leiden, Netherlands, pages 587-597, September 2017. Publisher: Lexical Computing CZ s.r.o. URL:
  32. Barbara McGillivray and Adam Kilgarriff. Tools for Historical Corpus Research, and a Corpus of Latin. New Methods in Historical Corpus Linguistics, 1(3):247-257, 2013. Google Scholar
  33. Albert Meroño-Peñuela, Victor de Boer, Marieke van Erp, Willem Melder, Rick Mourits, Ruben Schalk, and Richard Zijdeman. Ontologies in CLARIAH: Towards Interoperability in History, Language and Media. arXiv, 2020. URL:
  34. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations, pages 1-12, 2013. Google Scholar
  35. Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. Advances in Pre-Training Distributed Word Representations. In International Conference on Language Resources and Evaluation, pages 52-55, 2018. Google Scholar
  36. Tim Miller. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intelligence, 267:1–38, February 2019. URL:
  37. Maximilian Nickel and Douwe Kiela. Poincaré Embeddings for Learning Hierarchical Representations. In Proceedings of the 31st International Conference on Neural Information Processing Systems, page 6341–6350, 2017. Google Scholar
  38. Jeffrey Pennington, Richard Socher, and Christopher Manning. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532-1543. Association for Computational Linguistics, October 2014. URL:
  39. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227-2237, New Orleans, Louisiana, 2018. Association for Computational Linguistics. URL:
  40. P. Jonathon Phillips, Carina A. Hahn, Peter C. Fontana, David A. Broniatowski, and Mark A. Przybocki. Four Principles of Explainable Artificial Intelligence. National Institute of Standards and Technology, U.S. Department of Commerce, August 2020. URL:
  41. Melvin Richter. The History of Political and Social Concepts: A Critical Introduction. Oxford University Press, 1995. Google Scholar
  42. Julia Rodina, Yuliya Trofimova, Andrey Kutuzov, and Ekaterina Artemova. ELMo and BERT in Semantic Change Detection for Russian. CoRR, abs/2010.03481, 2020. URL:
  43. Laurent Romary, Mohamed Khemakhem, Fahad Khan, Jack Bowers, Nicoletta Calzolari, Monte George, Mandy Pet, and Piotr Bański. LMF Reloaded. arXiv preprint, 2019. URL:
  44. Guy D. Rosin and Kira Radinsky. Generating Timelines by Modeling Semantic Change. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), page 186–195. Association for Computational Linguistics, 2019. URL:
  45. Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. In Workshop on Energy Efficient Machine Learning and Cognitive Computing, pages 1-5, 2019. Google Scholar
  46. Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky, and Nina Tahmasebi. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1-23, 2020. Google Scholar
  47. Thanos G Stavropoulos, Stelios Andreadis, Marina Riga, Efstratios Kontopoulos, Panagiotis Mitzias, and Ioannis Kompatsiaris. A Framework for Measuring Semantic Drift in Ontologies. In CEUR Workshop Proceedings Vol-1695, September 2016. Google Scholar
  48. Nina Tahmasebi, L. Borin, and A. Jatowt. Survey of Computational Approaches to Lexical Semantic Change. arXiv, 2018. URL:
  49. Adam Tsakalidis and Maria Liakata. Sequential Modelling of the Evolution of Word Representations for Semantic Change Detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8485-8497. Association for Computational Linguistics, November 2020. URL:
  50. Alessandro Vatri and Barbara McGillivray. The Diorisis Ancient Greek Corpus: Linguistics and Literature. Research Data Journal for the Humanities and Social Sciences, 3(1):55-65, 2018. Google Scholar
  51. Shenghui Wang, Stefan Schlobach, and Michel Klein. Concept Drift and How to Identify It. Journal of Web Semantics First Look, September 2011. URL:
  52. Chris Welty, Richard Fikes, and Selene Makarios. A Reusable Ontology for Fluents in OWL. In FOIS, volume 150, pages 226-236, 2006. Google Scholar
  53. Gerhard Wohlgenannt and Filip Minic. Using word2vec to Build a Simple Ontology Learning System. International Semantic Web Conference, page 4, 2016. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail