HISTORIAE, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case

Armaselu, Florentina; Apostol, Elena-Simona; Khan, Anas Fahad; Liebeskind, Chaya; McGillivray, Barbara; Truică, Ciprian-Octavian; Valūnaitė Oleškevičienė, Giedrė

doi:10.4230/OASIcs.LDK.2021.34

File

OASIcs.LDK.2021.34.pdf

Filesize: 0.92 MB
13 pages

Document Identifiers

DOI: 10.4230/OASIcs.LDK.2021.34
URN: urn:nbn:de:0030-drops-145704

Author Details

Florentina Armaselu

Centre for Contemporary and Digital History (Csuperscript2DH), University of Luxembourg, Luxembourg

Elena-Simona Apostol

Department of Computer Science and Engineering, Faculty of Automatic Control and Computer, University Politehnica of Bucharest, Romania

Anas Fahad Khan

Institute for Computational Linguistics lessless{}A. Zampolligreatergreater{}, National Research Council of Italy, Pisa, Italy

Chaya Liebeskind

Department of Computer Science, Jerusalem College of Technology, Israel

Barbara McGillivray

Theoretical and Applied Linguistics, Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, UK
The Alan Turing Institute, London, UK

Ciprian-Octavian Truică

Department of Computer Science and Engineering, Faculty of Automatic Control and Computer, University Politehnica of Bucharest, Romania

Giedrė Valūnaitė Oleškevičienė

Institute of Humanities, Mykolas Romeris University, Vilnius, Lietuva

Cite AsGet BibTex

Florentina Armaselu, Elena-Simona Apostol, Anas Fahad Khan, Chaya Liebeskind, Barbara McGillivray, Ciprian-Octavian Truică, and Giedrė Valūnaitė Oleškevičienė. HISTORIAE, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 34:1-34:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/OASIcs.LDK.2021.34

@InProceedings{armaselu_et_al:OASIcs.LDK.2021.34,
  author =	{Armaselu, Florentina and Apostol, Elena-Simona and Khan, Anas Fahad and Liebeskind, Chaya and McGillivray, Barbara and Truic\u{a}, Ciprian-Octavian and Val\={u}nait\.{e} Ole\v{s}kevi\v{c}ien\.{e}, Giedr\.{e}},
  title =	{{HISTORIAE, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{34:1--34:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.34},
  URN =		{urn:nbn:de:0030-drops-145704},
  doi =		{10.4230/OASIcs.LDK.2021.34},
  annote =	{Keywords: linguistic linked open data, natural language processing, semantic change, diachronic ontologies, digital humanities}
}

@InProceedings{armaselu_et_al:OASIcs.LDK.2021.34,
  author =	{Armaselu, Florentina and Apostol, Elena-Simona and Khan, Anas Fahad and Liebeskind, Chaya and McGillivray, Barbara and Truic\u{a}, Ciprian-Octavian and Val\={u}nait\.{e} Ole\v{s}kevi\v{c}ien\.{e}, Giedr\.{e}},
  title =	{{HISTORIAE, History of Socio-Cultural Transformation as Linguistic Data Science. A Humanities Use Case}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{34:1--34:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.34},
  URN =		{urn:nbn:de:0030-drops-145704},
  doi =		{10.4230/OASIcs.LDK.2021.34},
  annote =	{Keywords: linguistic linked open data, natural language processing, semantic change, diachronic ontologies, digital humanities}
}

Abstract

The paper proposes an interdisciplinary approach including methods from disciplines such as history of concepts, linguistics, natural language processing (NLP) and Semantic Web, to create a comparative framework for detecting semantic change in multilingual historical corpora and generating diachronic ontologies as linguistic linked open data (LLOD). Initiated as a use case (UC4.2.1) within the COST Action Nexus Linguarum, European network for Web-centred linguistic data science, the study will explore emerging trends in knowledge extraction, analysis and representation from linguistic data science, and apply the devised methodology to datasets in the humanities to trace the evolution of concepts from the domain of socio-cultural transformation. The paper will describe the main elements of the methodological framework and preliminary planning of the intended workflow.

Subject Classification

ACM Subject Classification

Computing methodologies → Semantic networks
Computing methodologies → Ontology engineering
Computing methodologies → Temporal reasoning
Computing methodologies → Lexical semantics
Computing methodologies → Language resources
Computing methodologies → Information extraction

Keywords

linguistic linked open data
natural language processing
semantic change
diachronic ontologies
digital humanities

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Alessandro Arcangeli. Cultural History. A Concise Introduction. Routledge, 1 edition, 2012. URL: https://doi.org/10.4324/9780203789247.
Muhammad Nabeel Asim, Muhammad Wasim, Muhammad Usman Ghani Khan, Waqar Mahmood, and Hafiza Mahnoor Abbasi. A Survey of Ontology Learning Techniques and Applications. Database, 2018, January 2018. URL: https://doi.org/10.1093/database/bay101.
Isabelle Augenstein, Sebastian Padó, and Sebastian Rudolph. LODifier: Generating Linked Data from Unstructured Text. In Elena Simperl, Philipp Cimiano, Axel Polleres, Oscar Corcho, and Valentina Presutti, editors, The Semantic Web: Research and Applications, volume 7295 of Lecture Notes in Computer Science, page 210–224. Springer Berlin Heidelberg, 2012. URL: https://doi.org/10.1007/978-3-642-30284-8_21.
Arianna Betti and Hein van den Berg. Modelling the History of Ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014. URL: https://doi.org/10.1080/09608788.2014.949217.
Yuri Bizzoni, Marius Mosbach, Dietrich Klakow, and Stefania Degaetano-Ortlieb. Some Steps Towards the Generation of Diachronic WordNets. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 55-64, 2019. URL: https://www.aclweb.org/anthology/W19-6106.
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5:135-146, 2017. URL: https://doi.org/10.1162/tacl_a_00051.
P. Buitelaar, P. Cimiano, and B. Magnini. Ontology Learning from Text: An Overview. In Ontology Learning from Text: Methods, Evaluation and Applications, volume 123, pages 3-12. IOS Press, 2005.
Philipp Cimiano, Christian Chiarcos, John P. McCrae, and Jorge Gracia. Linguistic Linked Data in Digital Humanities. In Linguistic Linked Data. Representation, Generation and Applications, pages 229-262. Springer International Publishing, 1 edition, 2020. URL: https://www.springer.com/gp/book/9783030302245.
Philipp Cimiano and Johanna Volker. Text2Onto. A Framework for Ontology Learning and Data-driven Change Discovery. Natural Language Processing and Information Systems: 10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005, Alicante, Spain, June 15 - 17, 2005; proceedings. Lecture Notes in Computer Science, 3513. Montoyo A, Munoz R, Metais E (Eds); Springer: 227-238., 2005.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Conference of the North American Chapter of the Association for Computational Linguistics, pages 4171-4186. Association for Computational Linguistics, 2019. URL: https://doi.org/10.18653/v1/N19-1423.
Euthymios Drymonas, Kalliopi Zervanou, and Euripides G. M. Petrakis. Unsupervised Ontology Acquisition from Plain Texts: The OntoGain System. In Christina J. Hopfe, Yacine Rezgui, Elisabeth Métais, Alun Preece, and Haijiang Li, editors, Natural Language Processing and Information Systems, volume 6177 of Lecture Notes in Computer Science, page 277–287. Springer Berlin Heidelberg, 2010. URL: https://doi.org/10.1007/978-3-642-13881-2_29.
Maud Ehrmann, Matteo Romanello, Simon Clematide, Phillip Ströbel, and Raphaël Barman. Language Resources for Historical Newspapers: the Impresso Collection. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association (ELRA), 2020.
Antske Fokkens, Serge Ter Braake, Isa Maks, and Davide Ceolin. On the Semantics of Concept Drift: Towards Formal Deﬁnitions of Semantic Change. Drift-a-LOD@EKAW, 2016.
Christian Fäth, Christian Chiarcos, Björn Ebbrecht, and Maxim Ionov. Fintan - Flexible, Integrated Transformation and Annotation eNgineering. In Proceedings of the 12th Conference on Language Resources and Evaluation, page 7212–7221. European Language Resources Association (ELRA), licensed under CC-BY-NC, May 2020.
Dirk Geeraerts. Theories of lexical semantics. Oxford University Press, 2010.
Jolanta Gelumbeckaite, Mindaugas Sinkunas, and Vytautas Zinkevicius. Old Lithuanian Reference Corpus (SLIEKKAS) and Automated Grammatical Annotation. J. Lang. Technol. Comput. Linguistics, 27(2):83-96, 2012.
Mario Giulianelli, Marco Del Tredici, and Raquel Fernández. Analysing Lexical Semantic Change with Contextualised Word Representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 3960-3973, Online, 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.acl-main.365.
Hongyu Gong, Suma Bhat, and Pramod Viswanath. Enriching Word Embeddings with Temporal and Spatial Information. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 1-11, Online, November 2020. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/2020.conll-1.1.
Jon Atle Gulla, Geir Solskinnsbakk, Per Myrseth, Veronika Haderlein, and Olga Cerrato. Semantic Drift in Ontologies. In WEBIST 2010, Proceedings of the 6th International Conference on Web Information Systems and Technologies, volume 2, April 2010.
Shaoda He, Xiaojun Zou, Liumingjing Xiao, and Junfeng Hu. Construction of Diachronic Ontologies from People’s Daily of Fifty Years. LREC 2014 Proceedings, 2014.
Vivek Iyer, Mohan Mohan, Y. Raghu Babu Reddy, and Mehar Bhatia. A Survey on Ontology Enrichment from Text. In The sixteenth International Conference on Natural Language Processing (ICON-2019), 2019.
Vani Kanjirangat, Sandra Mitrovic, Alessandro Antonucci, and Fabio Rinaldi. SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-based Embedding Spaces. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 214-221, 2020.
Anas Fahad Khan. Towards the Representation of Etymological Data on the Semantic Web. Information, 9(12), November 2018. Publisher: MDPI AG. URL: https://doi.org/10.3390/info9120304.
Diederik P. Kingma and Max Welling. Auto-Encoding Variational Bayes. In International Conference on Learning Representations, 2014.
Reinhart Koselleck. Some Reflections on the Temporal Structure of Conceptual Change. In Willem Melching and Velema Wyger, editors, Main Trends in Cultural History. Ten Essays, page 7–16. Rodopi, 1994.
Andrey Kutuzov, Lilja Øvrelid, Terrence Szymanski, and Erik Velldal. Diachronic Word Embeddings and Semantic Shifts: A Survey. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1384-1397, Santa Fe, New Mexico, USA, 2018. Association for Computational Linguistics.
Jouni-Matti Kuukkanen. Making Sense of Conceptual Change. History and Theory, 47(3):351-372, 2008. URL: https://doi.org/10.1111/j.1468-2303.2008.00459.x.
Chaya Liebeskind and Shmuel Liebeskind. Deep Learning for Period Classification of Historical Hebrew Texts. Journal of Data Mining and Digital Humanities, 2020.
Jianyi Liu, Yu Tian Ru Zhang, Youqiang Sun, and Chan Wang. A Two-Stage Generative Adversarial Networks With Semantic Content Constraints for Adversarial Example Generation. IEEE Access, 8:205766-205777, 2020. URL: https://doi.org/10.1109/ACCESS.2020.3037329.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs], July 2019. URL: http://arxiv.org/abs/1907.11692.
John P. McCrae, Julia Bosque-Gil, Jorge Gracia, Paul Buitelaar, and Philipp Cimiano. The OntoLex-Lemon Model: Development and Applications. Electronic Lexicography in the 21st Century. Proc. of eLex 2017 conference, in Leiden, Netherlands, pages 587-597, September 2017. Publisher: Lexical Computing CZ s.r.o. URL: https://elex.link/elex2017/wp-content/uploads/2017/09/paper36.pdf.
Barbara McGillivray and Adam Kilgarriff. Tools for Historical Corpus Research, and a Corpus of Latin. New Methods in Historical Corpus Linguistics, 1(3):247-257, 2013.
Albert Meroño-Peñuela, Victor de Boer, Marieke van Erp, Willem Melder, Rick Mourits, Ruben Schalk, and Richard Zijdeman. Ontologies in CLARIAH: Towards Interoperability in History, Language and Media. arXiv, 2020. URL: http://arxiv.org/abs/2004.02845v2.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In International Conference on Learning Representations, pages 1-12, 2013.
Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. Advances in Pre-Training Distributed Word Representations. In International Conference on Language Resources and Evaluation, pages 52-55, 2018.
Tim Miller. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intelligence, 267:1–38, February 2019. URL: https://doi.org/10.1016/j.artint.2018.07.007.
Maximilian Nickel and Douwe Kiela. Poincaré Embeddings for Learning Hierarchical Representations. In Proceedings of the 31st International Conference on Neural Information Processing Systems, page 6341–6350, 2017.
Jeffrey Pennington, Richard Socher, and Christopher Manning. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532-1543. Association for Computational Linguistics, October 2014. URL: https://doi.org/10.3115/v1/D14-1162.
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227-2237, New Orleans, Louisiana, 2018. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/N18-1202.
P. Jonathon Phillips, Carina A. Hahn, Peter C. Fontana, David A. Broniatowski, and Mark A. Przybocki. Four Principles of Explainable Artificial Intelligence. National Institute of Standards and Technology, U.S. Department of Commerce, August 2020. URL: https://doi.org/10.6028/NIST.IR.8312-draft.
Melvin Richter. The History of Political and Social Concepts: A Critical Introduction. Oxford University Press, 1995.
Julia Rodina, Yuliya Trofimova, Andrey Kutuzov, and Ekaterina Artemova. ELMo and BERT in Semantic Change Detection for Russian. CoRR, abs/2010.03481, 2020. URL: http://arxiv.org/abs/2010.03481.
Laurent Romary, Mohamed Khemakhem, Fahad Khan, Jack Bowers, Nicoletta Calzolari, Monte George, Mandy Pet, and Piotr Bański. LMF Reloaded. arXiv preprint, 2019. URL: http://arxiv.org/abs/1906.02136.
Guy D. Rosin and Kira Radinsky. Generating Timelines by Modeling Semantic Change. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), page 186–195. Association for Computational Linguistics, 2019. URL: https://doi.org/10.18653/v1/K19-1018.
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. In Workshop on Energy Efficient Machine Learning and Cognitive Computing, pages 1-5, 2019.
Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky, and Nina Tahmasebi. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1-23, 2020.
Thanos G Stavropoulos, Stelios Andreadis, Marina Riga, Efstratios Kontopoulos, Panagiotis Mitzias, and Ioannis Kompatsiaris. A Framework for Measuring Semantic Drift in Ontologies. In CEUR Workshop Proceedings Vol-1695, September 2016.
Nina Tahmasebi, L. Borin, and A. Jatowt. Survey of Computational Approaches to Lexical Semantic Change. arXiv, 2018. URL: http://arxiv.org/abs/1811.06278.
Adam Tsakalidis and Maria Liakata. Sequential Modelling of the Evolution of Word Representations for Semantic Change Detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8485-8497. Association for Computational Linguistics, November 2020. URL: https://doi.org/10.18653/v1/2020.emnlp-main.682.
Alessandro Vatri and Barbara McGillivray. The Diorisis Ancient Greek Corpus: Linguistics and Literature. Research Data Journal for the Humanities and Social Sciences, 3(1):55-65, 2018.
Shenghui Wang, Stefan Schlobach, and Michel Klein. Concept Drift and How to Identify It. Journal of Web Semantics First Look, September 2011. URL: https://doi.org/10.2139/ssrn.3199520.
Chris Welty, Richard Fikes, and Selene Makarios. A Reusable Ontology for Fluents in OWL. In FOIS, volume 150, pages 226-236, 2006.
Gerhard Wohlgenannt and Filip Minic. Using word2vec to Build a Simple Ontology Learning System. International Semantic Web Conference, page 4, 2016.