Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF

Authors Christian Chiarcos , Maxim Ionov



PDF
Thumbnail PDF

File

OASIcs.LDK.2019.3.pdf
  • Filesize: 0.78 MB
  • 15 pages

Document Identifiers

Author Details

Christian Chiarcos
  • Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany
Maxim Ionov
  • Applied Computational Linguistics Lab, Goethe University Frankfurt, Germany

Cite AsGet BibTex

Christian Chiarcos and Maxim Ionov. Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.LDK.2019.3

Abstract

The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web.

Subject Classification

ACM Subject Classification
  • Information systems → Graph-based database models
  • Computing methodologies → Language resources
  • Computing methodologies → Knowledge representation and reasoning
Keywords
  • Linguistic Linked Open Data (LLOD)
  • less-resourced languages in the (multilingual) Semantic Web
  • interlinear glossed text (IGT)
  • data modeling

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Christian Chiarcos, Maxim Ionov, Monika Rind-Pawlowski, Christian Fäth, Jesse Wichers Schreur, and Irina Nevskaya. LLODifying linguistic glosses. In Proceedings of Language, Data and Knowledge (LDK-2017), Galway, Ireland, June 2017. Google Scholar
  2. Christian Chiarcos and Maria Sukhareva. OLiA - Ontologies of Linguistic Annotation. Semantic Web Journal, 518:379-386, 2015. Google Scholar
  3. Bernard Comrie, Martin Haspelmath, and Balthasar Bickel. The Leipzig Glossing Rules: Conventions for interlinear morpheme-by-morpheme glosses. https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf, 2008.
  4. David Filip, Shaun McCance, Dave Lewis, Christian Lieske, Arle Lommel, Jirka Kosek, and Felix Sasaki. Internationalization Tag Set (ITS) Version 2.0. Technical report, W3C Recommendation, 2013. Google Scholar
  5. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. Integrating NLP using Linked Data. In 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, 2013. URL: http://svn.aksw.org/papers/2013/ISWC_NIF/public.pdf.
  6. Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmer. Integrating NLP using Linked Data. In Proc. 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia, 2013. also see URL: http://persistence.uni-leipzig.org/nlp2rdf/.
  7. William D. Lewis. ODIN: A model for adapting and enriching legacy infrastructure. In Second International Conference on e-Science and Grid Technologies (e-Science 2006), 4-6 December 2006, Amsterdam, The Netherlands, page 137. IEEE Computer Society, 2006. URL: http://dx.doi.org/10.1109/E-SCIENCE.2006.106.
  8. John P. McCrae, Julia Bosque-Gil, Jorge Gracia, Paul Buitelaar, and Philipp Cimiano. The Ontolex-Lemon model: development and applications. In Proceedings of eLex 2017 conference, pages 19-21, 2017. Google Scholar
  9. Sebastian Nordhoff and Harald Hammarström. Glottolog/Langdoc: Defining dialects, languages, and language families as collections of resources. In First International Workshop on Linked Science 2011-In conjunction with the International Semantic Web Conference (ISWC 2011), 2011. Google Scholar
  10. Robert Sanderson, Paolo Ciccarese, and Herbert Van de Sompel. Open Annotation Data Model. Technical report, W3C Community Draft, 08 February 2013, 2013. Google Scholar
  11. Robert Sanderson, Paolo Ciccarese, and Benjamin Young. Web Annotation Data Model. Technical report, W3C Recommendation, 2017. Google Scholar
  12. Stuart Weibel, John Kunze, Carl Lagoze, and Misha Wolf. RFC 2413 - Dublin Core metadata for resource discovery. URL http://www.ietf.org/rfc/rfc2413.txt (July 31, 2012), September 1998. Network Working Group.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail