Search Results

Documents authored by Ionov, Maxim


Document
An Ontology for CoNLL-RDF: Formal Data Structures for TSV Formats in Language Technology

Authors: Christian Chiarcos, Maxim Ionov, Luis Glaser, and Christian Fäth

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
In language technology and language sciences, tab-separated values (TSV) represent a frequently used formalism to represent linguistically annotated natural language, often addressed as "CoNLL formats". A large number of such formats do exist, but although they share a number of common features, they are not interoperable, as different pieces of information are encoded differently in these dialects. CoNLL-RDF refers to a programming library and the associated data model that has been introduced to facilitate processing and transforming such TSV formats in a serialization-independent way. CoNLL-RDF represents CoNLL data, by means of RDF graphs and SPARQL update operations, but so far, without machine-readable semantics, with annotation properties created dynamically on the basis of a user-defined mapping from columns to labels. Current applications of CoNLL-RDF include linking between corpora and dictionaries [Mambrini and Passarotti, 2019] and knowledge graphs [Tamper et al., 2018], syntactic parsing of historical languages [Chiarcos et al., 2018; Chiarcos et al., 2018], the consolidation of syntactic and semantic annotations [Chiarcos and Fäth, 2019], a bridge between RDF corpora and a traditional corpus query language [Ionov et al., 2020], and language contact studies [Chiarcos et al., 2018]. We describe a novel extension of CoNLL-RDF, introducing a formal data model, formalized as an ontology. The ontology is a basis for linking RDF corpora with other Semantic Web resources, but more importantly, its application for transformation between different TSV formats is a major step for providing interoperability between CoNLL formats.

Cite as

Christian Chiarcos, Maxim Ionov, Luis Glaser, and Christian Fäth. An Ontology for CoNLL-RDF: Formal Data Structures for TSV Formats in Language Technology. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 20:1-20:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{chiarcos_et_al:OASIcs.LDK.2021.20,
  author =	{Chiarcos, Christian and Ionov, Maxim and Glaser, Luis and F\"{a}th, Christian},
  title =	{{An Ontology for CoNLL-RDF: Formal Data Structures for TSV Formats in Language Technology}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{20:1--20:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.20},
  URN =		{urn:nbn:de:0030-drops-145566},
  doi =		{10.4230/OASIcs.LDK.2021.20},
  annote =	{Keywords: language technology, data models, CoNLL-RDF, ontology}
}
Document
APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text

Authors: Maxim Ionov

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic examples from APiCS, the Atlas of Pidgin and Creole Language Structures. Interlinear glossed text (IGT) plays an important role in typological and theoretical linguistic research, especially with understudied and endangered languages: It provides a way to understand linguistic phenomena without necessarily knowing the source language which is crucial for these languages since native speakers are not always easily accessible. Previously, we presented Ligt, RDF vocabulary created for representing interlinear glosses in text segments. In this paper, we present our conversion of the APiCS IGT dataset into this model and describe our efforts in linking linguistic annotations to an external ontology to add semantic representation.

Cite as

Maxim Ionov. APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 27:1-27:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{ionov:OASIcs.LDK.2021.27,
  author =	{Ionov, Maxim},
  title =	{{APiCS-Ligt: Towards Semantic Enrichment of Interlinear Glossed Text}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{27:1--27:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.27},
  URN =		{urn:nbn:de:0030-drops-145633},
  doi =		{10.4230/OASIcs.LDK.2021.27},
  annote =	{Keywords: Linguistic Linked Open Data (LLOD), less-resourced languages in the (multilingual) Semantic Web, interlinear glossed text (IGT), data modeling}
}
Document
Linking Discourse Marker Inventories

Authors: Christian Chiarcos and Maxim Ionov

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data.

Cite as

Christian Chiarcos and Maxim Ionov. Linking Discourse Marker Inventories. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 40:1-40:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{chiarcos_et_al:OASIcs.LDK.2021.40,
  author =	{Chiarcos, Christian and Ionov, Maxim},
  title =	{{Linking Discourse Marker Inventories}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{40:1--40:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.40},
  URN =		{urn:nbn:de:0030-drops-145769},
  doi =		{10.4230/OASIcs.LDK.2021.40},
  annote =	{Keywords: discourse processing, discourse markers, linked data, OntoLex, OLiA}
}
Document
Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF

Authors: Christian Chiarcos and Maxim Ionov

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web.

Cite as

Christian Chiarcos and Maxim Ionov. Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chiarcos_et_al:OASIcs.LDK.2019.3,
  author =	{Chiarcos, Christian and Ionov, Maxim},
  title =	{{Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{3:1--3:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.3},
  URN =		{urn:nbn:de:0030-drops-103672},
  doi =		{10.4230/OASIcs.LDK.2019.3},
  annote =	{Keywords: Linguistic Linked Open Data (LLOD), less-resourced languages in the (multilingual) Semantic Web, interlinear glossed text (IGT), data modeling}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail