The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages

Authors Frances Gillis-Webber , Sabine Tittel

Thumbnail PDF


  • Filesize: 0.77 MB
  • 15 pages

Document Identifiers

Author Details

Frances Gillis-Webber
  • Library and Information Studies Centre, University of Cape Town, South Africa
Sabine Tittel
  • Heidelberg Academy of Sciences and Humanities, Heidelberg, Germany


We would like to thank the reviewers for helpful comments and insightful feedback.

Cite AsGet BibTex

Frances Gillis-Webber and Sabine Tittel. The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


In recent years, the modeling of data from linguistic resources with Resource Description Framework (RDF), following the Linked Data paradigm and using the OntoLex-Lemon vocabulary, has become a prevalent method to create datasets for a multilingual web of data. An important aspect of data modeling is the use of language tags to mark lexicons, lexemes, word senses, etc. of a linguistic dataset. However, attempts to model data from lesser-known languages show significant shortcomings with the authoritative list of language codes by ISO 639: for many lesser-known languages spoken by minorities and also for historical stages of languages, language codes, the basis of language tags, are simply not available. This paper discusses these shortcomings based on the examples of three such languages, i.e., two varieties of click languages of Southern Africa together with Old French, and suggests solutions for the issues identified.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Language resources
  • Information systems → Dictionaries
  • Information systems → Semantic web description languages
  • Information systems → Graph-based database models
  • Information systems → Resource Description Framework (RDF)
  • Software and its engineering → Interoperability
  • Information systems → Multilingual and cross-lingual retrieval
  • Computing methodologies → Information extraction
  • Computing methodologies → Artificial intelligence
  • language codes
  • language tags
  • Resource Description Framework
  • Linked Data
  • Linguistic Linked Data
  • Khoisan languages
  • click languages
  • N|uu
  • ||'Au
  • Old French


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. K. Baldinger. Dictionnaire étymologique de l'ancien français - DEAF. Presses de L'Université Laval / Niemeyer / De Gruyter, Québec/Tübingen/Berlin, since 1971. [Continued by Frankwalt Möhren, and Thomas Städtler; DEAFél:].
  2. A. Bellandi, E. Giovannetti, and A. Weingart. Multilingual and Multiword Phenomena in a lemon Old Occitan Medico-Botanical Lexicon. Information, 9 (3), 52, 2018. Google Scholar
  3. T. Berners-Lee. Linked Data. World Wide Web Consortium, 2006. Google Scholar
  4. Ch. Bizer, T. Heath, and T. Berners-Lee. Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems, 5:1-22, 2009. Google Scholar
  5. M. Brenzinger. The twelve modern Khoisan languages. In A. Witzlack-Makarevich and M. Ernszt, editors, Khoisan Languages and Linguistics: Proceedings of the 3rd International Symposium July 6-10, 2008, Riezlern / Kleinwalsertal, pages 1-32. Köppe Verlag, 2008. Google Scholar
  6. Ch. Chiarcos, J. McCrae, Ph. Cimiano, and Ch. Fellbaum. Towards Open Data for Linguistics: Lexical Linked Data. In A. Oltramari et al., editor, New Trends of Research in Ontologies and Lexical Resources: Ideas, Projects, Systems, pages 7-25. Springer, Berlin, Heidelberg, 2013. Google Scholar
  7. Ch. Chiarcos and M. Sukhareva. Linking Etymological Databases. A Case Study in Germanic. In 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing, page 41, 2014. Google Scholar
  8. P. Cimiano, J.P. McCrae, and P. Buitelaar. Lexicon model for ontologies: community report, 10 May 2016. Ontology-Lexicon Community Group under the W3C Community Final Specification Agreement (FSA), 2016. URL:
  9. R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1. concepts and abstract syntax: W3C recommendation 25 February 2014, 2014. URL:
  10. Gerard de Melo. Language-Related Information for the Linguistic Linked Data Cloud. Semantic Web, 6(4):393-400, August 2015. Google Scholar
  11. Th. Declerck, E. Wandl-Vogt, and K. Mörth. Towards a Pan European Lexicography by Means of Linked (Open) Data. In I. Kosem et. al., editor, Electronic Lexicography in the 21st Century: Linking Lexical Data in the Digital Age. Proceedings of the eLex 2015 Conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom, pages 342-355. Trojina, Institute for Applied Slovene Studies/Lexical Computing Ltd., 2015. Google Scholar
  12. International Organization for Standardization. Language codes - ISO 639. URL:
  13. F. Gillis-Webber. Conversion of the English-Xhosa Dictionary for Nurses to a linguistic linked data framework. Information, 9(11), 2018. URL:
  14. F. Gillis-Webber, S. Tittel, and C. M. Keet. A Model for Language Annotations on the Web, 2019. (submitted). Google Scholar
  15. J. Gracia, M. Villegas, A. Gómez-Pérez, and N. Bel. The Apertium Bilingual Dictionaries on the Web of Data. In Semantic Web - Interoperability, Usability, Applicability, pages 1-10. IOS Press, 2017. Google Scholar
  16. R. Güldermann. Towards casting a wider net over N∥ng: chances and challenges of archival Khoisan resources, 2014. URL:
  17. H. Hammarström, R. Forkel, and M. Haspelmath. Glottolog 3.3., 2018. accesssed 21-02-2019. Google Scholar
  18. SIL International. ISO 639-3: Relationship between ISO 639-3 and the other parts of ISO 639, 2017. URL:
  19. SIL International. ISO 639-3: Scope of denotation for language identifiers, 2017. URL:
  20. SIL International. ISO 639-3: Types of individual languages, 2017. URL:
  21. R. Ishida. Language Tags in HTML and XML, 2014. URL:
  22. F. Khan, J.E. Díaz-Vera, and M. Monachini. The Representation of an Old English Emotion Lexicon as Linked Open Data. In John P. McCrae et al., editor, Proceedings of the LREC 2016 Workshop "LDL 2016 – 5th Workshop on Linked Data in Linguistics: Managing, Building and Using Linked Language Resources", 24 May 2016 - Portorož, Slovenia, pages 73-76, 2016. Google Scholar
  23. G. Köbler. Wörterbuch des althochdeutschen Sprachschatzes. Schöningh, Paderborn, 1993. Google Scholar
  24. L. Lezcano, S. Sánchez-Alonso, and A. Roa-Valverde. A Survey on the Exchange of Linguistic Resources. Program, 47,3:263-281, 2013. Google Scholar
  25. J. Lieberman, R. Singh, and Ch. Goad. W3C geospatial vocabulary: W3C incubator group report 23 October 2007, 2007. Google Scholar
  26. J.P. McCrae, J. Bosque-Gil, J. Gracia, P. Buitelaar, and P. Cimiano. The OntoLex-Lemon model: Development and Applications. In Proceedings of ELEX 2017: Lexicography from Scratch. September 2017, pages 19-21, 2017. Google Scholar
  27. S. Moran and M. Brümmer. Lemon-aid: Using Lemon to Aid Quantitative Historical Linguistic Analysis. In Ch. Chiarcos et al., editor, Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013), Pisa, September 2013, pages 28-33. Ass. for Comp. Linguistics, 2013. Google Scholar
  28. A. Phillips and M. Davis. Tags for Identifiying Languages. BCP, 47, 2009. Google Scholar
  29. C.M. Schlebusch, P. Skoglund, and P. Sjödin et al. Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History. Science, 338(6105):374-379, 2012. Google Scholar
  30. S. Shah and M. Brenzinger. Ouma Geelmeid ke kx’u ∥xa∥xa N∣uu. Centre for African Language Diversity, University of Cape Town, Cape Town, 2016. Google Scholar
  31. S. Tittel, H. Bermúdez-Sabel, and Ch. Chiarcos. Using RDFa to Link Text and Dictionary Data for Medieval French. In J.P. McCrae et al., editor, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 6th Workshop on Linked Data in Linguistics (LDL-2018), Miyazaki, Japan, 2018, pages 30-38, Paris (ELRA), 2018. Google Scholar
  32. S. Tittel and Ch. Chiarcos. Historical Lexicography of Old French and Linked Open Data: Transforming the Resources of the Dictionnaire étymologique de l'ancien français with OntoLex-Lemon. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). GLOBALEX Workshop (GLOBALEX-2018), Miyazaki, Japan, 2018, pages 58-66, Paris (ELRA), 2018. Google Scholar
  33. M. Van Der Merwe. Giving breath to a dying history, 2015. URL:
  34. W. von Wartburg. Französisches Etymologisches Wörterbuch. Eine darstellung des galloromanischen sprachschatzes - FEW. ATILF, since 1922. [Continued by O. Jänicke, C.T. Gossen, J.-P. Chambon, J.-P. Chauveau, and Yan Greub]. Google Scholar
  35. D. Wood, M. Zaidman, L. Ruth, and M. Hausenblas. Linked data: structured data on the web. Manning Publications Co., New York, 2014. Google Scholar