APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text

Author Maxim Ionov



PDF
Thumbnail PDF

File

OASIcs.LDK.2021.27.pdf
  • Filesize: 0.57 MB
  • 8 pages

Document Identifiers

Author Details

Maxim Ionov
  • Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany

Cite AsGet BibTex

Maxim Ionov. APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 27:1-27:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/OASIcs.LDK.2021.27

Abstract

This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic examples from APiCS, the Atlas of Pidgin and Creole Language Structures. Interlinear glossed text (IGT) plays an important role in typological and theoretical linguistic research, especially with understudied and endangered languages: It provides a way to understand linguistic phenomena without necessarily knowing the source language which is crucial for these languages since native speakers are not always easily accessible. Previously, we presented Ligt, RDF vocabulary created for representing interlinear glosses in text segments. In this paper, we present our conversion of the APiCS IGT dataset into this model and describe our efforts in linking linguistic annotations to an external ontology to add semantic representation.

Subject Classification

ACM Subject Classification
  • Information systems → Graph-based database models
  • Computing methodologies → Language resources
  • Computing methodologies → Knowledge representation and reasoning
Keywords
  • Linguistic Linked Open Data (LLOD)
  • less-resourced languages in the (multilingual) Semantic Web
  • interlinear glossed text (IGT)
  • data modeling

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. C. Chiarcos and M. Sukhareva. OLiA - Ontologies of Linguistic Annotation. Semantic Web Journal, 518:379-386, 2015. Google Scholar
  2. Christian Chiarcos and Maxim Ionov. Ligt: An LLOD-Native vocabulary for representing Interlinear Glossed Text as RDF. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. Google Scholar
  3. Christian Chiarcos, Maxim Ionov, Monika Rind-Pawlowski, Christian Fäth, Jesse Wichers Schreur, and Irina Nevskaya. LLODifying linguistic glosses. In Proceedings of Language, Data and Knowledge (LDK-2017), Galway, Ireland, June 2017. Google Scholar
  4. Bernard Comrie, Martin Haspelmath, and Balthasar Bickel. The Leipzig Glossing Rules: Conventions for interlinear morpheme-by-morpheme glosses. https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf, 2008.
  5. Robert MW Dixon. Ergativity. Cambridge University Press, 1994. Google Scholar
  6. S. Farrar and D. T. Langendoen. An OWL-DL implementation of GOLD: An ontology for the Semantic Web. In A. Witt and D. Metzing, editors, Linguistic Modeling of Information and Markup Languages: Contributions to Language Technology. Springer, Dordrecht, Netherlands, 2010. Google Scholar
  7. Robert Forkel, Johann-Mattis List, Simon J Greenhill, Christoph Rzymski, Sebastian Bank, Michael Cysouw, Harald Hammarström, Martin Haspelmath, Gereon A Kaiping, and Russell D Gray. Cross-linguistic data formats, advancing data sharing and re-use in comparative linguistics. Scientific data, 5(1):1-10, 2018. Google Scholar
  8. Martin Haspelmath. Pre-established categories don't exist: Consequences for language description and typology. Linguistic typology, 11(1), 2007. Google Scholar
  9. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. Integrating NLP using Linked Data. In Proc. 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, 2013. also see URL: http://persistence.uni-leipzig.org/nlp2rdf/.
  10. Gregg Kellogg and Jeni Tennison. Model for tabular data and metadata on the web. W3C recommendation, W3C, 2015. URL: https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/.
  11. M. Kemps-Snijders, M. Windhouwer, P. Wittenburg, and S. E. Writh. ISOcat: remodelling metadata for language resources. International Journal of Metdata, Semantics and Ontologies, 4(4):261-276, 2009. Google Scholar
  12. Bettina Klimek, Markus Ackermann, Martin Brümmer, and Sebastian Hellmann. Mmoon core-the multilingual morpheme ontology. Semantic Web Journal, 2020. Google Scholar
  13. Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber, editors. APiCS Online. Max Planck Institute for Evolutionary Anthropology, Leipzig, 2013. URL: https://apics-online.info/.
  14. Sebastian Nordhoff. Modelling and annotating interlinear glossed text from 280 different endangered languages as linked data with ligt. In Proceedings of the 14th Linguistic Annotation Workshop, pages 93-104, 2020. Google Scholar
  15. Robert Sanderson, Paolo Ciccarese, and Herbert Van de Sompel. Open annotation data model. Technical report, W3C Community Draft, 08 February 2013, 2013. Google Scholar
  16. John Sylak-Glassman. The composition and use of the universal morphological feature schema (unimorph schema). Johns Hopkins University, 2016. Google Scholar
  17. S. Weibel, J. Kunze, C. Lagoze, , and M. Wolf. RFC 2413 - Dublin Core metadata for resource discovery. URL http://www.ietf.org/rfc/rfc2413.txt (July 31, 2012), September 1998. Network Working Group.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail