OASIcs, Volume 70

2nd Conference on Language, Data and Knowledge (LDK 2019)



Thumbnail PDF

Event

LDK 2019, May 20-23, 2019, Leipzig, Germany

Editors

Maria Eskevich
  • CLARIN ERIC, Utrecht, The Netherlands
Gerard de Melo
  • Department of Computer Science, Rutgers University - New Brunswick, NJ, USA
Christian Fäth
  • Applied Computational Linguistics, Goethe University Frankfurt, Germany
John P. McCrae
  • Insight Centre for Data Analytics, National University of Ireland Galway, Ireland
Paul Buitelaar
  • Insight Centre for Data Analytics, National University of Ireland Galway, Ireland
Christian Chiarcos
  • Applied Computational Linguistics, Goethe University Frankfurt, Germany
Bettina Klimek
  • Agile Knowledge Engineering and Semantic Web, University of Leipzig, Germany
Milan Dojchinovski
  • Agile Knowledge Engineering and Semantic Web, University of Leipzig, Germany

Publication Details

  • published at: 2019-05-16
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-105-4
  • DBLP: db/conf/ldk/ldk2019

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
OASIcs, Volume 70, LDK'19, Complete Volume

Authors: Maria Eskevich, Gerard de Melo, Christian Fäth, John P. McCrae, Paul Buitelaar, Christian Chiarcos, Bettina Klimek, and Milan Dojchinovski


Abstract
OASIcs, Volume 70, LDK'19, Complete Volume

Cite as

2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@Proceedings{eskevich_et_al:OASIcs.LDK.2019,
  title =	{{OASIcs, Volume 70, LDK'19, Complete Volume}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019},
  URN =		{urn:nbn:de:0030-drops-105045},
  doi =		{10.4230/OASIcs.LDK.2019},
  annote =	{Keywords: Computing methodologies, Natural language processing, Knowledge representation and reasoning}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Maria Eskevich, Gerard de Melo, Christian Fäth, John P. McCrae, Paul Buitelaar, Christian Chiarcos, Bettina Klimek, and Milan Dojchinovski


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{eskevich_et_al:OASIcs.LDK.2019.0,
  author =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{0:i--0:xvi},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.0},
  URN =		{urn:nbn:de:0030-drops-103641},
  doi =		{10.4230/OASIcs.LDK.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Short Paper
SPARQL Query Recommendation by Example: Assessing the Impact of Structural Analysis on Star-Shaped Queries

Authors: Alessandro Adamou, Carlo Allocca, Mathieu d'Aquin, and Enrico Motta


Abstract
One of the existing query recommendation strategies for unknown datasets is "by example", i.e. based on a query that the user already knows how to formulate on another dataset within a similar domain. In this paper we measure what contribution a structural analysis of the query and the datasets can bring to a recommendation strategy, to go alongside approaches that provide a semantic analysis. Here we concentrate on the case of star-shaped SPARQL queries over RDF datasets. The illustrated strategy performs a least general generalization on the given query, computes the specializations of it that are satisfiable by the target dataset, and organizes them into a graph. It then visits the graph to recommend first the reformulated queries that reflect the original query as closely as possible. This approach does not rely upon a semantic mapping between the two datasets. An implementation as part of the SQUIRE query recommendation library is discussed.

Cite as

Alessandro Adamou, Carlo Allocca, Mathieu d'Aquin, and Enrico Motta. SPARQL Query Recommendation by Example: Assessing the Impact of Structural Analysis on Star-Shaped Queries. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 1:1-1:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{adamou_et_al:OASIcs.LDK.2019.1,
  author =	{Adamou, Alessandro and Allocca, Carlo and d'Aquin, Mathieu and Motta, Enrico},
  title =	{{SPARQL Query Recommendation by Example: Assessing the Impact of Structural Analysis on Star-Shaped Queries}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{1:1--1:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.1},
  URN =		{urn:nbn:de:0030-drops-103651},
  doi =		{10.4230/OASIcs.LDK.2019.1},
  annote =	{Keywords: SPARQL, query recommendation, query structure, dataset profiling}
}
Document
OWL^C: A Contextual Two-Dimensional Web Ontology Language

Authors: Sahar Aljalbout, Didier Buchs, and Gilles Falquet


Abstract
Representing and reasoning on contexts is an open problem in the semantic web. Despite the fact that context representation has for a long time been treated locally by semantic web practitioners, a recognized and widely accepted consensus regarding the way of encoding and particularly reasoning on contextual knowledge has not yet been reached by far. In this paper, we present OWL^C : a contextual two-dimensional web ontology language. Using the first dimension, we can reason on contexts-dependent classes, properties, and axioms and using the second dimension, we can reason on knowledge about contexts which we consider formal objects, as proposed by McCarthy [McCarthy, 1987]. We demonstrate the modeling strength and reasoning capabilities of OWL^C with a practical scenario from the digital humanity domain. We chose the Ferdinand de Saussure [Joseph, 2012] use case in virtue of its inherent contextual nature, as well as its notable complexity which allows us to highlight many issues connected with contextual knowledge representation and reasoning.

Cite as

Sahar Aljalbout, Didier Buchs, and Gilles Falquet. OWL^C: A Contextual Two-Dimensional Web Ontology Language. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{aljalbout_et_al:OASIcs.LDK.2019.2,
  author =	{Aljalbout, Sahar and Buchs, Didier and Falquet, Gilles},
  title =	{{OWL^C: A Contextual Two-Dimensional Web Ontology Language}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{2:1--2:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.2},
  URN =		{urn:nbn:de:0030-drops-103666},
  doi =		{10.4230/OASIcs.LDK.2019.2},
  annote =	{Keywords: Contextual Reasoning, OWL^C, Contexts in digital humanities}
}
Document
Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF

Authors: Christian Chiarcos and Maxim Ionov


Abstract
The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web.

Cite as

Christian Chiarcos and Maxim Ionov. Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chiarcos_et_al:OASIcs.LDK.2019.3,
  author =	{Chiarcos, Christian and Ionov, Maxim},
  title =	{{Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{3:1--3:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.3},
  URN =		{urn:nbn:de:0030-drops-103672},
  doi =		{10.4230/OASIcs.LDK.2019.3},
  annote =	{Keywords: Linguistic Linked Open Data (LLOD), less-resourced languages in the (multilingual) Semantic Web, interlinear glossed text (IGT), data modeling}
}
Document
The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages

Authors: Frances Gillis-Webber and Sabine Tittel


Abstract
In recent years, the modeling of data from linguistic resources with Resource Description Framework (RDF), following the Linked Data paradigm and using the OntoLex-Lemon vocabulary, has become a prevalent method to create datasets for a multilingual web of data. An important aspect of data modeling is the use of language tags to mark lexicons, lexemes, word senses, etc. of a linguistic dataset. However, attempts to model data from lesser-known languages show significant shortcomings with the authoritative list of language codes by ISO 639: for many lesser-known languages spoken by minorities and also for historical stages of languages, language codes, the basis of language tags, are simply not available. This paper discusses these shortcomings based on the examples of three such languages, i.e., two varieties of click languages of Southern Africa together with Old French, and suggests solutions for the issues identified.

Cite as

Frances Gillis-Webber and Sabine Tittel. The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{gilliswebber_et_al:OASIcs.LDK.2019.4,
  author =	{Gillis-Webber, Frances and Tittel, Sabine},
  title =	{{The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{4:1--4:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.4},
  URN =		{urn:nbn:de:0030-drops-103682},
  doi =		{10.4230/OASIcs.LDK.2019.4},
  annote =	{Keywords: language codes, language tags, Resource Description Framework, Linked Data, Linguistic Linked Data, Khoisan languages, click languages, N|uu, ||'Au, Old French}
}
Document
Extended Abstract
Functional Representation of Technical Artefacts in Ontology-Terminology Models

Authors: Laura Giacomini


Abstract
The ontological coverage of technical artefacts in terminography should take into account a functional representation of conceptual information. We present a model for a function-based description which enables direct interfacing of ontological properties and terminology, and which was developed in the context of a project on term variation in technical texts. Starting from related research in the field of knowledge engineering, we introduce the components of the ontological function macrocategory and discuss the implementation of the model in lemon.

Cite as

Laura Giacomini. Functional Representation of Technical Artefacts in Ontology-Terminology Models. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 5:1-5:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{giacomini:OASIcs.LDK.2019.5,
  author =	{Giacomini, Laura},
  title =	{{Functional Representation of Technical Artefacts in Ontology-Terminology Models}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{5:1--5:6},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.5},
  URN =		{urn:nbn:de:0030-drops-103697},
  doi =		{10.4230/OASIcs.LDK.2019.5},
  annote =	{Keywords: terminology, ontology, technical artefact, function model, semantic web, lemon}
}
Document
Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Authors: Bharathi Raja Chakravarthi, Mihael Arcan, and John P. McCrae


Abstract
Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription.

Cite as

Bharathi Raja Chakravarthi, Mihael Arcan, and John P. McCrae. Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chakravarthi_et_al:OASIcs.LDK.2019.6,
  author =	{Chakravarthi, Bharathi Raja and Arcan, Mihael and McCrae, John P.},
  title =	{{Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.6},
  URN =		{urn:nbn:de:0030-drops-103700},
  doi =		{10.4230/OASIcs.LDK.2019.6},
  annote =	{Keywords: Under-resourced languages, Machine translation, Dravidian languages, Phonetic transcription, Transliteration, International Phonetic Alphabet, IPA, Multilingual machine translation, Multilingual data}
}
Document
CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation

Authors: Christian Chiarcos and Niko Schenk


Abstract
The proper detection of tokens in of running text represents the initial processing step in modular NLP pipelines. But strategies for defining these minimal units can differ, and conflicting analyses of the same text seriously limit the integration of subsequent linguistic annotations into a shared representation. As a solution, we introduce CoNLL Merge, a practical tool for harmonizing TSV-related data models, as they occur, e.g., in multi-layer corpora with non-sequential, concurrent tokenizations, but also in ensemble combinations in Natural Language Processing. CoNLL Merge works unsupervised, requires no manual intervention or external data sources, and comes with a flexible API for fully automated merging routines, validity and sanity checks. Users can chose from several merging strategies, and either preserve a reference tokenization (with possible losses of annotation granularity), create a common tokenization layer consisting of minimal shared subtokens (loss-less in terms of annotation granularity, destructive against a reference tokenization), or present tokenization clashes (loss-less and non-destructive, but introducing empty tokens as place-holders for unaligned elements). We demonstrate the applicability of the tool on two use cases from natural language processing and computational philology.

Cite as

Christian Chiarcos and Niko Schenk. CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chiarcos_et_al:OASIcs.LDK.2019.7,
  author =	{Chiarcos, Christian and Schenk, Niko},
  title =	{{CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{7:1--7:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.7},
  URN =		{urn:nbn:de:0030-drops-103717},
  doi =		{10.4230/OASIcs.LDK.2019.7},
  annote =	{Keywords: data heterogeneity, tokenization, tab-separated values (TSV) format, linguistic annotation, merging}
}
Document
Exploiting Background Knowledge for Argumentative Relation Classification

Authors: Jonathan Kobbe, Juri Opitz, Maria Becker, Ioana Hulpuş, Heiner Stuckenschmidt, and Anette Frank


Abstract
Argumentative relation classification is the task of determining the type of relation (e.g., support or attack) that holds between two argument units. Current state-of-the-art models primarily exploit surface-linguistic features including discourse markers, modals or adverbials to classify argumentative relations. However, a system that performs argument analysis using mainly rhetorical features can be easily fooled by the stylistic presentation of the argument as opposed to its content, in cases where a weak argument is concealed by strong rhetorical means. This paper explores the difficulties and the potential effectiveness of knowledge-enhanced argument analysis, with the aim of advancing the state-of-the-art in argument analysis towards a deeper, knowledge-based understanding and representation of arguments. We propose an argumentative relation classification system that employs linguistic as well as knowledge-based features, and investigate the effects of injecting background knowledge into a neural baseline model for argumentative relation classification. Starting from a Siamese neural network that classifies pairs of argument units into support vs. attack relations, we extend this system with a set of features that encode a variety of features extracted from two complementary background knowledge resources: ConceptNet and DBpedia. We evaluate our systems on three different datasets and show that the inclusion of background knowledge can improve the classification performance by considerable margins. Thus, our work offers a first step towards effective, knowledge-rich argument analysis.

Cite as

Jonathan Kobbe, Juri Opitz, Maria Becker, Ioana Hulpuş, Heiner Stuckenschmidt, and Anette Frank. Exploiting Background Knowledge for Argumentative Relation Classification. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kobbe_et_al:OASIcs.LDK.2019.8,
  author =	{Kobbe, Jonathan and Opitz, Juri and Becker, Maria and Hulpu\c{s}, Ioana and Stuckenschmidt, Heiner and Frank, Anette},
  title =	{{Exploiting Background Knowledge for Argumentative Relation Classification}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{8:1--8:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.8},
  URN =		{urn:nbn:de:0030-drops-103723},
  doi =		{10.4230/OASIcs.LDK.2019.8},
  annote =	{Keywords: argument structure analysis, background knowledge, argumentative functions, argument classification, commonsense knowledge relations}
}
Document
Short Paper
Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar

Authors: Christian Chiarcos and Christian Fäth


Abstract
This paper describes the application of annotation engineering techniques for the construction of a corpus for Role and Reference Grammar (RRG). RRG is a semantics-oriented formalism for natural language syntax popular in comparative linguistics and linguistic typology, and predominantly applied for the description of non-European languages which are less-resourced in terms of natural language processing. Because of its cross-linguistic applicability and its conjoint treatment of syntax and semantics, RRG also represents a promising framework for research challenges within natural language processing. At the moment, however, these have not been explored as no RRG corpus data is publicly available. While RRG annotations cannot be easily derived from any single treebank in existence, we suggest that they can be reliably inferred from the intersection of syntactic and semantic annotations as represented by, for example, the Universal Dependencies (UD) and PropBank (PB), and we demonstrate this for the English Web Treebank, a 250,000 token corpus of various genres of English internet text. The resulting corpus is a gold corpus for future experiments in natural language processing in the sense that it is built on existing annotations which have been created manually. A technical challenge in this context is to align UD and PB annotations, to integrate them in a coherent manner, and to distribute and to combine their information on RRG constituent and operator projections. For this purpose, we describe a framework for flexible and scalable annotation engineering based on flexible, unconstrained graph transformations of sentence graphs by means of SPARQL Update.

Cite as

Christian Chiarcos and Christian Fäth. Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 9:1-9:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chiarcos_et_al:OASIcs.LDK.2019.9,
  author =	{Chiarcos, Christian and F\"{a}th, Christian},
  title =	{{Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{9:1--9:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.9},
  URN =		{urn:nbn:de:0030-drops-103731},
  doi =		{10.4230/OASIcs.LDK.2019.9},
  annote =	{Keywords: Role and Reference Grammar, NLP, Corpus, Semantic Web, LLOD, Syntax, Semantics}
}
Document
Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets

Authors: Omnia Zayed, John P. McCrae, and Paul Buitelaar


Abstract
Metaphor is one of the most important elements of human communication, especially in informal settings such as social media. There have been a number of datasets created for metaphor identification, however, this task has proven difficult due to the nebulous nature of metaphoricity. In this paper, we present a crowd-sourcing approach for the creation of a dataset for metaphor identification, that is able to rapidly achieve large coverage over the different usages of metaphor in a given corpus while maintaining high accuracy. We validate this methodology by creating a set of 2,500 manually annotated tweets in English, for which we achieve inter-annotator agreement scores over 0.8, which is higher than other reported results that did not limit the task. This methodology is based on the use of an existing classifier for metaphor in order to assist in the identification and the selection of the examples for annotation, in a way that reduces the cognitive load for annotators and enables quick and accurate annotation. We selected a corpus of both general language tweets and political tweets relating to Brexit and we compare the resulting corpus on these two domains. As a result of this work, we have published the first dataset of tweets annotated for metaphors, which we believe will be invaluable for the development, training and evaluation of approaches for metaphor identification in tweets.

Cite as

Omnia Zayed, John P. McCrae, and Paul Buitelaar. Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{zayed_et_al:OASIcs.LDK.2019.10,
  author =	{Zayed, Omnia and McCrae, John P. and Buitelaar, Paul},
  title =	{{Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{10:1--10:17},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.10},
  URN =		{urn:nbn:de:0030-drops-103740},
  doi =		{10.4230/OASIcs.LDK.2019.10},
  annote =	{Keywords: metaphor, identification, tweets, dataset, annotation, crowd-sourcing}
}
Document
Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications

Authors: Christian Jilek, Markus Schröder, Rudolf Novik, Sven Schwarz, Heiko Maus, and Andreas Dengel


Abstract
A growing number of applications users daily interact with have to operate in (near) real-time: chatbots, digital companions, knowledge work support systems - just to name a few. To perform the services desired by the user, these systems have to analyze user activity logs or explicit user input extremely fast. In particular, text content (e.g. in form of text snippets) needs to be processed in an information extraction task. Regarding the aforementioned temporal requirements, this has to be accomplished in just a few milliseconds, which limits the number of methods that can be applied. Practically, only very fast methods remain, which on the other hand deliver worse results than slower but more sophisticated Natural Language Processing (NLP) pipelines. In this paper, we investigate and propose methods for real-time capable Named Entity Recognition (NER). As a first improvement step, we address word variations induced by inflection, for example present in the German language. Our approach is ontology-based and makes use of several language information sources like Wiktionary. We evaluated it using the German Wikipedia (about 9.4B characters), for which the whole NER process took considerably less than an hour. Since precision and recall are higher than with comparably fast methods, we conclude that the quality gap between high speed methods and sophisticated NLP pipelines can be narrowed a bit more without losing real-time capable runtime performance.

Cite as

Christian Jilek, Markus Schröder, Rudolf Novik, Sven Schwarz, Heiko Maus, and Andreas Dengel. Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{jilek_et_al:OASIcs.LDK.2019.11,
  author =	{Jilek, Christian and Schr\"{o}der, Markus and Novik, Rudolf and Schwarz, Sven and Maus, Heiko and Dengel, Andreas},
  title =	{{Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.11},
  URN =		{urn:nbn:de:0030-drops-103759},
  doi =		{10.4230/OASIcs.LDK.2019.11},
  annote =	{Keywords: Ontology-based information extraction, Named entity recognition, Inflectional languages, Real-time systems}
}
Document
Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study

Authors: Oana Inel and Lora Aroyo


Abstract
Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%.

Cite as

Oana Inel and Lora Aroyo. Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 12:1-12:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{inel_et_al:OASIcs.LDK.2019.12,
  author =	{Inel, Oana and Aroyo, Lora},
  title =	{{Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{12:1--12:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.12},
  URN =		{urn:nbn:de:0030-drops-103762},
  doi =		{10.4230/OASIcs.LDK.2019.12},
  annote =	{Keywords: Crowdsourcing, Human-in-the-Loop, Event Extraction, Time Extraction}
}
Document
Short Paper
A Proposal for a Two-Way Journey on Validating Locations in Unstructured and Structured Data

Authors: Ilkcan Keles, Omar Qawasmeh, Tabea Tietz, Ludovica Marinucci, Roberto Reda, and Marieke van Erp


Abstract
The Web of Data has grown explosively over the past few years, and as with any dataset, there are bound to be invalid statements in the data, as well as gaps. Natural Language Processing (NLP) is gaining interest to fill gaps in data by transforming (unstructured) text into structured data. However, there is currently a fundamental mismatch in approaches between Linked Data and NLP as the latter is often based on statistical methods, and the former on explicitly modelling knowledge. However, these fields can strengthen each other by joining forces. In this position paper, we argue that using linked data to validate the output of an NLP system, and using textual data to validate Linked Open Data (LOD) cloud statements is a promising research avenue. We illustrate our proposal with a proof of concept on a corpus of historical travel stories.

Cite as

Ilkcan Keles, Omar Qawasmeh, Tabea Tietz, Ludovica Marinucci, Roberto Reda, and Marieke van Erp. A Proposal for a Two-Way Journey on Validating Locations in Unstructured and Structured Data. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 13:1-13:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{keles_et_al:OASIcs.LDK.2019.13,
  author =	{Keles, Ilkcan and Qawasmeh, Omar and Tietz, Tabea and Marinucci, Ludovica and Reda, Roberto and van Erp, Marieke},
  title =	{{A Proposal for a Two-Way Journey on Validating Locations in Unstructured and Structured Data}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{13:1--13:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.13},
  URN =		{urn:nbn:de:0030-drops-103778},
  doi =		{10.4230/OASIcs.LDK.2019.13},
  annote =	{Keywords: data validity, natural language processing, linked data}
}
Document
Name Variants for Improving Entity Discovery and Linking

Authors: Albert Weichselbraun, Philipp Kuntschik, and Adrian M. P. Braşoveanu


Abstract
Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content. This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements.

Cite as

Albert Weichselbraun, Philipp Kuntschik, and Adrian M. P. Braşoveanu. Name Variants for Improving Entity Discovery and Linking. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 14:1-14:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{weichselbraun_et_al:OASIcs.LDK.2019.14,
  author =	{Weichselbraun, Albert and Kuntschik, Philipp and Bra\c{s}oveanu, Adrian M. P.},
  title =	{{Name Variants for Improving Entity Discovery and Linking}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{14:1--14:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.14},
  URN =		{urn:nbn:de:0030-drops-103787},
  doi =		{10.4230/OASIcs.LDK.2019.14},
  annote =	{Keywords: Named Entity Linking, Name Variance, Machine Learning, Linked Data}
}
Document
Short Paper
Interlinking SciGraph and DBpedia Datasets Using Link Discovery and Named Entity Recognition Techniques

Authors: Beyza Yaman, Michele Pasin, and Markus Freudenberg


Abstract
In recent years we have seen a proliferation of Linked Open Data (LOD) compliant datasets becoming available on the web, leading to an increased number of opportunities for data consumers to build smarter applications which integrate data coming from disparate sources. However, often the integration is not easily achievable since it requires discovering and expressing associations across heterogeneous data sets. The goal of this work is to increase the discoverability and reusability of the scholarly data by integrating them to highly interlinked datasets in the LOD cloud. In order to do so we applied techniques that a) improve the identity resolution across these two sources using Link Discovery for the structured data (i.e. by annotating Springer Nature (SN) SciGraph entities with links to DBpedia entities), and b) enriching SN SciGraph unstructured text content (document abstracts) with links to DBpedia entities using Named Entity Recognition (NER). We published the results of this work using standard vocabularies and provided an interactive exploration tool which presents the discovered links w.r.t. the breadth and depth of the DBpedia classes.

Cite as

Beyza Yaman, Michele Pasin, and Markus Freudenberg. Interlinking SciGraph and DBpedia Datasets Using Link Discovery and Named Entity Recognition Techniques. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 15:1-15:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{yaman_et_al:OASIcs.LDK.2019.15,
  author =	{Yaman, Beyza and Pasin, Michele and Freudenberg, Markus},
  title =	{{Interlinking SciGraph and DBpedia Datasets Using Link Discovery and Named Entity Recognition Techniques}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{15:1--15:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.15},
  URN =		{urn:nbn:de:0030-drops-103791},
  doi =		{10.4230/OASIcs.LDK.2019.15},
  annote =	{Keywords: Linked Data, Named Entity Recognition, Link Discovery, Interlinking}
}
Document
lemon-tree: Representing Topical Thesauri on the Semantic Web

Authors: Sander Stolk


Abstract
An increasing number of dictionaries are represented on the Web in the form of linguistic linked data using the lemon vocabulary. Such a representation facilitates interoperability across linguistic resources, has the potential to increase their visibility, and promotes their reuse. Lexicographic resources other than dictionaries have thus far not been the main focus of efforts surrounding lemon and its modules. In this paper, fundamental needs are analysed for representing topical thesauri specifically and a solution is provided for two important areas hitherto problematic: (1) levels that can be distinguished in their topical system and (2) a looser form of categorization than lexicalization. The novel lemon-tree model contains terminology to overcome these issues and acts as bridge between existing Web standards in order to bring topical thesauri, too, to the Semantic Web.

Cite as

Sander Stolk. lemon-tree: Representing Topical Thesauri on the Semantic Web. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 16:1-16:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{stolk:OASIcs.LDK.2019.16,
  author =	{Stolk, Sander},
  title =	{{lemon-tree: Representing Topical Thesauri on the Semantic Web}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{16:1--16:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.16},
  URN =		{urn:nbn:de:0030-drops-103804},
  doi =		{10.4230/OASIcs.LDK.2019.16},
  annote =	{Keywords: lemon-tree, lemon, OntoLex, SKOS, thesaurus, topical thesaurus, onomasiological ordering, linked data}
}
Document
Short Paper
Translation-Based Dictionary Alignment for Under-Resourced Bantu Languages

Authors: Thomas Eckart, Sonja Bosch, Dirk Goldhahn, Uwe Quasthoff, and Bettina Klimek


Abstract
Despite a large number of active speakers, most Bantu languages can be considered as under- or less-resourced languages. This includes especially the current situation of lexicographical data, which is highly unsatisfactory concerning the size, quality and consistency in format and provided information. Unfortunately, this does not only hold for the amount and quality of data for monolingual dictionaries, but also for their lack of interconnection to form a network of dictionaries. Current endeavours to promote the use of Bantu languages in primary and secondary education in countries like South Africa show the urgent need for high-quality digital dictionaries. This contribution describes a prototypical implementation for aligning Xhosa, Zimbabwean Ndebele and Kalanga language dictionaries based on their English translations using simple string matching techniques and via WordNet URIs. The RDF-based representation of the data using the Bantu Language Model (BLM) and - partial - references to the established WordNet dataset supported this process significantly.

Cite as

Thomas Eckart, Sonja Bosch, Dirk Goldhahn, Uwe Quasthoff, and Bettina Klimek. Translation-Based Dictionary Alignment for Under-Resourced Bantu Languages. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 17:1-17:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{eckart_et_al:OASIcs.LDK.2019.17,
  author =	{Eckart, Thomas and Bosch, Sonja and Goldhahn, Dirk and Quasthoff, Uwe and Klimek, Bettina},
  title =	{{Translation-Based Dictionary Alignment for Under-Resourced Bantu Languages}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{17:1--17:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.17},
  URN =		{urn:nbn:de:0030-drops-103819},
  doi =		{10.4230/OASIcs.LDK.2019.17},
  annote =	{Keywords: Cross-language dictionary alignment, Bantu languages, translation, linguistic linked data, under-resourced languages}
}
Document
Short Paper
Cherokee Syllabary Texts: Digital Documentation and Linguistic Description

Authors: Jeffrey Bourns


Abstract
The Digital Archive of American Indian Languages Preservation and Perseverance (DAILP) is an innovative language revitalization project that seeks to provide digital infrastructure for the preservation and study of endangered languages among Native American speech communities. The project’s initial goal is to publish a digital collection of Cherokee-language documents to serve as the basis for language learning, cultural study, and linguistic research. Its primary texts derive from digitized manuscript images of historical Cherokee Syllabary texts, a written tradition that spans nearly two centuries. Of vital importance to DAILP is the participation and expertise of the Cherokee user community in processing such materials, specifically in Syllabary text transcription, romanization, and translation activities. To support the study and linguistic enrichment of such materials, the project is seeking to develop tools and services for the modeling, annotation, and sharing of DAILP texts and language data.

Cite as

Jeffrey Bourns. Cherokee Syllabary Texts: Digital Documentation and Linguistic Description. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 18:1-18:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{bourns:OASIcs.LDK.2019.18,
  author =	{Bourns, Jeffrey},
  title =	{{Cherokee Syllabary Texts: Digital Documentation and Linguistic Description}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{18:1--18:6},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.18},
  URN =		{urn:nbn:de:0030-drops-103828},
  doi =		{10.4230/OASIcs.LDK.2019.18},
  annote =	{Keywords: Cherokee language, Cherokee Syllabary, digital collections, documentary linguistics, linguistic annotation, Linguistic Linked Open Data}
}
Document
Short Paper
Metalexicography as Knowledge Graph

Authors: David Lindemann, Christiane Klaes, and Philipp Zumstein


Abstract
This short paper presents preliminary considerations regarding LexBib, a corpus, bibliography, and domain ontology of Lexicography and Dictionary Research, which is currently being developed at University of Hildesheim. The LexBib project is intended to provide a bibliographic metadata collection made available through an online reference platform. The corresponding full texts are processed with text mining methods for the generation of additional metadata, such as term candidates, topic models, and citations. All LexBib content is represented and also publicly accessible as RDF Linked Open Data. We discuss a data model that includes metadata for publication details and for the text mining results, and that considers relevant standards for an integration into the LOD cloud.

Cite as

David Lindemann, Christiane Klaes, and Philipp Zumstein. Metalexicography as Knowledge Graph. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 19:1-19:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{lindemann_et_al:OASIcs.LDK.2019.19,
  author =	{Lindemann, David and Klaes, Christiane and Zumstein, Philipp},
  title =	{{Metalexicography as Knowledge Graph}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{19:1--19:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.19},
  URN =		{urn:nbn:de:0030-drops-103832},
  doi =		{10.4230/OASIcs.LDK.2019.19},
  annote =	{Keywords: Bibliography, Metalexicography, Full Text Collection, E-science Corpus, Text Mining, RDF Data Model}
}
Document
Cross-Dictionary Linking at Sense Level with a Double-Layer Classifier

Authors: Roser Saurí, Louis Mahon, Irene Russo, and Mironas Bitinis


Abstract
We present a system for linking dictionaries at the sense level, which is part of a wider programme aiming to extend current lexical resources and to create new ones by automatic means. One of the main challenges of the sense linking task is the existence of non one-to-one mappings among senses. Our system handles this issue by addressing the task as a binary classification problem using standard Machine Learning methods, where each sense pair is classified independently from the others. In addition, it implements a second, statistically-based classification layer to also model the dependence existing among sense pairs, namely, the fact that a sense in one dictionary that is already linked to a sense in the other dictionary has a lower probability of being linked to a further sense. The resulting double-layer classifier achieves global Precision and Recall scores of 0.91 and 0.80, respectively.

Cite as

Roser Saurí, Louis Mahon, Irene Russo, and Mironas Bitinis. Cross-Dictionary Linking at Sense Level with a Double-Layer Classifier. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 20:1-20:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{sauri_et_al:OASIcs.LDK.2019.20,
  author =	{Saur{\'\i}, Roser and Mahon, Louis and Russo, Irene and Bitinis, Mironas},
  title =	{{Cross-Dictionary Linking at Sense Level with a Double-Layer Classifier}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{20:1--20:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.20},
  URN =		{urn:nbn:de:0030-drops-103848},
  doi =		{10.4230/OASIcs.LDK.2019.20},
  annote =	{Keywords: Word sense linking, word sense mapping, lexical translation, lexical resources, language data construction, multilingual data, data integration across languages}
}
Document
Towards the Detection and Formal Representation of Semantic Shifts in Inflectional Morphology

Authors: Dagmar Gromann and Thierry Declerck


Abstract
Semantic shifts caused by derivational morphemes is a common subject of investigation in language modeling, while inflectional morphemes are frequently portrayed as semantically more stable. This study is motivated by the previously established observation that inflectional morphemes can be just as variable as derivational ones. For instance, the English plural "-s" can turn the fabric silk into the garments of a jockey, silks. While humans know that silk in this sense has no plural, it takes more for machines to arrive at this conclusion. Frequently utilized computational language resources, such as WordNet, or models for representing computational lexicons, like OntoLex-Lemon, have no descriptive mechanism to represent such inflectional semantic shifts. To investigate this phenomenon, we extract word pairs of different grammatical number from WordNet that feature additional senses in the plural and evaluate their distribution in vector space, i.e., pre-trained word2vec and fastText embeddings. We then propose an extension of OntoLex-Lemon to accommodate this phenomenon that we call inflectional morpho-semantic variation to provide a formal representation accessible to algorithms, neural networks, and agents. While the exact scope of the problem is yet to be determined, this first dataset shows that it is not negligible.

Cite as

Dagmar Gromann and Thierry Declerck. Towards the Detection and Formal Representation of Semantic Shifts in Inflectional Morphology. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{gromann_et_al:OASIcs.LDK.2019.21,
  author =	{Gromann, Dagmar and Declerck, Thierry},
  title =	{{Towards the Detection and Formal Representation of Semantic Shifts in Inflectional Morphology}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{21:1--21:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.21},
  URN =		{urn:nbn:de:0030-drops-103856},
  doi =		{10.4230/OASIcs.LDK.2019.21},
  annote =	{Keywords: Inflectional morphology, semantic shift, embeddings, formal lexical modeling}
}
Document
Opening Digitized Newspapers Corpora: Europeana’s Full-Text Data Interoperability Case

Authors: Nuno Freire, Antoine Isaac, Twan Goosen, Daan Broeder, Hugo Manguinhas, and Valentine Charles


Abstract
Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a "full-text profile" for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.

Cite as

Nuno Freire, Antoine Isaac, Twan Goosen, Daan Broeder, Hugo Manguinhas, and Valentine Charles. Opening Digitized Newspapers Corpora: Europeana’s Full-Text Data Interoperability Case. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{freire_et_al:OASIcs.LDK.2019.22,
  author =	{Freire, Nuno and Isaac, Antoine and Goosen, Twan and Broeder, Daan and Manguinhas, Hugo and Charles, Valentine},
  title =	{{Opening Digitized Newspapers Corpora: Europeana’s Full-Text Data Interoperability Case}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{22:1--22:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.22},
  URN =		{urn:nbn:de:0030-drops-103869},
  doi =		{10.4230/OASIcs.LDK.2019.22},
  annote =	{Keywords: Metadata, Full-text, Interoperability, Data aggregation, Cultural Heritage, Research Infrastructures}
}
Document
Short Paper
Automatic Detection of Language and Annotation Model Information in CoNLL Corpora

Authors: Frank Abromeit and Christian Chiarcos


Abstract
We introduce AnnoHub, an on-going effort to automatically complement existing language resources with metadata about the languages they cover and the annotation schemes (tagsets) that they apply, to provide a web interface for their curation and evaluation by means of domain experts, and to publish them as a RDF dataset and as part of the (Linguistic) Linked Open Data (LLOD) cloud. In this paper, we focus on tabular formats with tab-separated values (TSV), a de-facto standard for annotated corpora as popularized as part of the CoNLL Shared Tasks. By extension, other formats for which a converter to CoNLL and/or TSV formats does exist, can be processed analoguously. We describe our implementation and its evaluation against a sample of 93 corpora from the Universal Dependencies, v.2.3.

Cite as

Frank Abromeit and Christian Chiarcos. Automatic Detection of Language and Annotation Model Information in CoNLL Corpora. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 23:1-23:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{abromeit_et_al:OASIcs.LDK.2019.23,
  author =	{Abromeit, Frank and Chiarcos, Christian},
  title =	{{Automatic Detection of Language and Annotation Model Information in CoNLL Corpora}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{23:1--23:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.23},
  URN =		{urn:nbn:de:0030-drops-103873},
  doi =		{10.4230/OASIcs.LDK.2019.23},
  annote =	{Keywords: LLOD, CoNLL, OLiA}
}
Document
Short Paper
The Secret to Popular Chinese Web Novels: A Corpus-Driven Study

Authors: Yi-Ju Lin and Shu-Kai Hsieh


Abstract
What is the secret to writing popular novels? The issue is an intriguing one among researchers from various fields. The goal of this study is to identify the linguistic features of several popular web novels as well as how the textual features found within and the overall tone interact with the genre and themes of each novel. Apart from writing style, non-textual information may also reveal details behind the success of web novels. Since web fiction has become a major industry with top writers making millions of dollars and their stories adapted into published books, determining essential elements of "publishable" novels is of importance. The present study further examines how non-textual information, namely, the number of hits, shares, favorites, and comments, may contribute to several features of the most popular published and unpublished web novels. Findings reveal that keywords, function words, and lexical diversity of a novel are highly related to its genres and writing style while dialogue proportion shows the narration voice of the story. In addition, relatively shorter sentences are found in these novels. The data also reveal that the number of favorites and comments serve as significant predictors for the number of shares and hits of unpublished web novels, respectively; however, the number of hits and shares of published web novels is more unpredictable.

Cite as

Yi-Ju Lin and Shu-Kai Hsieh. The Secret to Popular Chinese Web Novels: A Corpus-Driven Study. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 24:1-24:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{lin_et_al:OASIcs.LDK.2019.24,
  author =	{Lin, Yi-Ju and Hsieh, Shu-Kai},
  title =	{{The Secret to Popular Chinese Web Novels: A Corpus-Driven Study}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{24:1--24:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.24},
  URN =		{urn:nbn:de:0030-drops-103882},
  doi =		{10.4230/OASIcs.LDK.2019.24},
  annote =	{Keywords: Popular Chinese Web Novels, NLP techniques, Sentiment Analysis, Publication of Web novels}
}
Document
Predicting Math Success in an Online Tutoring System Using Language Data and Click-Stream Variables: A Longitudinal Analysis

Authors: Scott Crossley, Shamya Karumbaiah, Jaclyn Ocumpaugh, Matthew J. Labrum, and Ryan S. Baker


Abstract
Previous studies have demonstrated strong links between students' linguistic knowledge, their affective language patterns and their success in math. Other studies have shown that demographic and click-stream variables in online learning environments are important predictors of math success. This study builds on this research in two ways. First, it combines linguistics and click-stream variables along with demographic information to increase prediction rates for math success. Second, it examines how random variance, as found in repeated participant data, can explain math success beyond linguistic, demographic, and click-stream variables. The findings indicate that linguistic, demographic, and click-stream factors explained about 14% of the variance in math scores. These variables mixed with random factors explained about 44% of the variance.

Cite as

Scott Crossley, Shamya Karumbaiah, Jaclyn Ocumpaugh, Matthew J. Labrum, and Ryan S. Baker. Predicting Math Success in an Online Tutoring System Using Language Data and Click-Stream Variables: A Longitudinal Analysis. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 25:1-25:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{crossley_et_al:OASIcs.LDK.2019.25,
  author =	{Crossley, Scott and Karumbaiah, Shamya and Ocumpaugh, Jaclyn and Labrum, Matthew J. and Baker, Ryan S.},
  title =	{{Predicting Math Success in an Online Tutoring System Using Language Data and Click-Stream Variables: A Longitudinal Analysis}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{25:1--25:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.25},
  URN =		{urn:nbn:de:0030-drops-103895},
  doi =		{10.4230/OASIcs.LDK.2019.25},
  annote =	{Keywords: Natural language processing, math education, online tutoring systems, text analytics, click-stream variables}
}
Document
Extended Abstract
Can Computational Meta-Documentary Linguistics Provide for Accountability and Offer an Alternative to "Reproducibility" in Linguistics?

Authors: Tobias Weber


Abstract
As an answer to the need for accountability in linguistics, computational methodology and big data approaches offer an interesting perspective to the field of meta-documentary linguistics. The focus of this paper lies on the scientific process of citing published data and the insights this gives to the workings of a discipline. The proposed methodology shall aid to bring out the narratives of linguistic research within the literature. This can be seen as an alternative, philological approach to documentary linguistics.

Cite as

Tobias Weber. Can Computational Meta-Documentary Linguistics Provide for Accountability and Offer an Alternative to "Reproducibility" in Linguistics?. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 26:1-26:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{weber:OASIcs.LDK.2019.26,
  author =	{Weber, Tobias},
  title =	{{Can Computational Meta-Documentary Linguistics Provide for Accountability and Offer an Alternative to "Reproducibility" in Linguistics?}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{26:1--26:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.26},
  URN =		{urn:nbn:de:0030-drops-103900},
  doi =		{10.4230/OASIcs.LDK.2019.26},
  annote =	{Keywords: Language Documentation, meta-documentary Linguistics, Citation, Methodology, Digital Humanities, Philology, Intertextuality}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail