29 Search Results for "Eskevich, Maria"


Volume

OASIcs, Volume 70

2nd Conference on Language, Data and Knowledge (LDK 2019)

LDK 2019, May 20-23, 2019, Leipzig, Germany

Editors: Maria Eskevich, Gerard de Melo, Christian Fäth, John P. McCrae, Paul Buitelaar, Christian Chiarcos, Bettina Klimek, and Milan Dojchinovski

Document
Complete Volume
OASIcs, Volume 70, LDK'19, Complete Volume

Authors: Maria Eskevich, Gerard de Melo, Christian Fäth, John P. McCrae, Paul Buitelaar, Christian Chiarcos, Bettina Klimek, and Milan Dojchinovski

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
OASIcs, Volume 70, LDK'19, Complete Volume

Cite as

2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@Proceedings{eskevich_et_al:OASIcs.LDK.2019,
  title =	{{OASIcs, Volume 70, LDK'19, Complete Volume}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019},
  URN =		{urn:nbn:de:0030-drops-105045},
  doi =		{10.4230/OASIcs.LDK.2019},
  annote =	{Keywords: Computing methodologies, Natural language processing, Knowledge representation and reasoning}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Maria Eskevich, Gerard de Melo, Christian Fäth, John P. McCrae, Paul Buitelaar, Christian Chiarcos, Bettina Klimek, and Milan Dojchinovski

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{eskevich_et_al:OASIcs.LDK.2019.0,
  author =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{0:i--0:xvi},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.0},
  URN =		{urn:nbn:de:0030-drops-103641},
  doi =		{10.4230/OASIcs.LDK.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Short Paper
SPARQL Query Recommendation by Example: Assessing the Impact of Structural Analysis on Star-Shaped Queries

Authors: Alessandro Adamou, Carlo Allocca, Mathieu d'Aquin, and Enrico Motta

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
One of the existing query recommendation strategies for unknown datasets is "by example", i.e. based on a query that the user already knows how to formulate on another dataset within a similar domain. In this paper we measure what contribution a structural analysis of the query and the datasets can bring to a recommendation strategy, to go alongside approaches that provide a semantic analysis. Here we concentrate on the case of star-shaped SPARQL queries over RDF datasets. The illustrated strategy performs a least general generalization on the given query, computes the specializations of it that are satisfiable by the target dataset, and organizes them into a graph. It then visits the graph to recommend first the reformulated queries that reflect the original query as closely as possible. This approach does not rely upon a semantic mapping between the two datasets. An implementation as part of the SQUIRE query recommendation library is discussed.

Cite as

Alessandro Adamou, Carlo Allocca, Mathieu d'Aquin, and Enrico Motta. SPARQL Query Recommendation by Example: Assessing the Impact of Structural Analysis on Star-Shaped Queries. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 1:1-1:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{adamou_et_al:OASIcs.LDK.2019.1,
  author =	{Adamou, Alessandro and Allocca, Carlo and d'Aquin, Mathieu and Motta, Enrico},
  title =	{{SPARQL Query Recommendation by Example: Assessing the Impact of Structural Analysis on Star-Shaped Queries}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{1:1--1:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.1},
  URN =		{urn:nbn:de:0030-drops-103651},
  doi =		{10.4230/OASIcs.LDK.2019.1},
  annote =	{Keywords: SPARQL, query recommendation, query structure, dataset profiling}
}
Document
OWL^C: A Contextual Two-Dimensional Web Ontology Language

Authors: Sahar Aljalbout, Didier Buchs, and Gilles Falquet

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Representing and reasoning on contexts is an open problem in the semantic web. Despite the fact that context representation has for a long time been treated locally by semantic web practitioners, a recognized and widely accepted consensus regarding the way of encoding and particularly reasoning on contextual knowledge has not yet been reached by far. In this paper, we present OWL^C : a contextual two-dimensional web ontology language. Using the first dimension, we can reason on contexts-dependent classes, properties, and axioms and using the second dimension, we can reason on knowledge about contexts which we consider formal objects, as proposed by McCarthy [McCarthy, 1987]. We demonstrate the modeling strength and reasoning capabilities of OWL^C with a practical scenario from the digital humanity domain. We chose the Ferdinand de Saussure [Joseph, 2012] use case in virtue of its inherent contextual nature, as well as its notable complexity which allows us to highlight many issues connected with contextual knowledge representation and reasoning.

Cite as

Sahar Aljalbout, Didier Buchs, and Gilles Falquet. OWL^C: A Contextual Two-Dimensional Web Ontology Language. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{aljalbout_et_al:OASIcs.LDK.2019.2,
  author =	{Aljalbout, Sahar and Buchs, Didier and Falquet, Gilles},
  title =	{{OWL^C: A Contextual Two-Dimensional Web Ontology Language}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{2:1--2:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.2},
  URN =		{urn:nbn:de:0030-drops-103666},
  doi =		{10.4230/OASIcs.LDK.2019.2},
  annote =	{Keywords: Contextual Reasoning, OWL^C, Contexts in digital humanities}
}
Document
Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF

Authors: Christian Chiarcos and Maxim Ionov

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with interlinear glosses (IGT) in a linked data formalism. Interlinear glossing is a notation used in various fields of linguistics to provide readers with a way to understand linguistic phenomena and to provide corpus data when documenting endangered languages. This data is usually provided with morpheme-by-morpheme correspondence which is not supported by any established vocabularies for representing linguistic corpora or automated annotations. Interlinear Glossed Text can be stored and exchanged in several formats specifically designed for the purpose, but these differ in their designs and concepts, and they are tied to particular tools, so the reusability of the annotated data is limited. To improve interoperability and reusability, we propose to convert such glosses to a tool-independent representation well-suited for the Web of Data, i.e., a representation in RDF. Beyond establishing structural (format) interoperability by means of a common data representation, our approach also allows using shared vocabularies and terminology repositories available from the (Linguistic) Linked Open Data cloud. We describe the core vocabulary and the converters that use this vocabulary to convert IGT in a format of various widely-used tools into RDF. Ultimately, a Linked Data representation will facilitate the accessibility of language data from less-resourced language varieties within the (Linguistic) Linked Open Data cloud, as well as enable novel ways to access and integrate this information with (L)LOD dictionary data and other types of lexical-semantic resources. In a longer perspective, data currently only available through these formats will become more visible and reusable and contribute to the development of a truly multilingual (semantic) web.

Cite as

Christian Chiarcos and Maxim Ionov. Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chiarcos_et_al:OASIcs.LDK.2019.3,
  author =	{Chiarcos, Christian and Ionov, Maxim},
  title =	{{Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{3:1--3:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.3},
  URN =		{urn:nbn:de:0030-drops-103672},
  doi =		{10.4230/OASIcs.LDK.2019.3},
  annote =	{Keywords: Linguistic Linked Open Data (LLOD), less-resourced languages in the (multilingual) Semantic Web, interlinear glossed text (IGT), data modeling}
}
Document
The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages

Authors: Frances Gillis-Webber and Sabine Tittel

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
In recent years, the modeling of data from linguistic resources with Resource Description Framework (RDF), following the Linked Data paradigm and using the OntoLex-Lemon vocabulary, has become a prevalent method to create datasets for a multilingual web of data. An important aspect of data modeling is the use of language tags to mark lexicons, lexemes, word senses, etc. of a linguistic dataset. However, attempts to model data from lesser-known languages show significant shortcomings with the authoritative list of language codes by ISO 639: for many lesser-known languages spoken by minorities and also for historical stages of languages, language codes, the basis of language tags, are simply not available. This paper discusses these shortcomings based on the examples of three such languages, i.e., two varieties of click languages of Southern Africa together with Old French, and suggests solutions for the issues identified.

Cite as

Frances Gillis-Webber and Sabine Tittel. The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{gilliswebber_et_al:OASIcs.LDK.2019.4,
  author =	{Gillis-Webber, Frances and Tittel, Sabine},
  title =	{{The Shortcomings of Language Tags for Linked Data When Modeling Lesser-Known Languages}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{4:1--4:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.4},
  URN =		{urn:nbn:de:0030-drops-103682},
  doi =		{10.4230/OASIcs.LDK.2019.4},
  annote =	{Keywords: language codes, language tags, Resource Description Framework, Linked Data, Linguistic Linked Data, Khoisan languages, click languages, N|uu, ||'Au, Old French}
}
Document
Extended Abstract
Functional Representation of Technical Artefacts in Ontology-Terminology Models

Authors: Laura Giacomini

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
The ontological coverage of technical artefacts in terminography should take into account a functional representation of conceptual information. We present a model for a function-based description which enables direct interfacing of ontological properties and terminology, and which was developed in the context of a project on term variation in technical texts. Starting from related research in the field of knowledge engineering, we introduce the components of the ontological function macrocategory and discuss the implementation of the model in lemon.

Cite as

Laura Giacomini. Functional Representation of Technical Artefacts in Ontology-Terminology Models. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 5:1-5:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{giacomini:OASIcs.LDK.2019.5,
  author =	{Giacomini, Laura},
  title =	{{Functional Representation of Technical Artefacts in Ontology-Terminology Models}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{5:1--5:6},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.5},
  URN =		{urn:nbn:de:0030-drops-103697},
  doi =		{10.4230/OASIcs.LDK.2019.5},
  annote =	{Keywords: terminology, ontology, technical artefact, function model, semantic web, lemon}
}
Document
Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages

Authors: Bharathi Raja Chakravarthi, Mihael Arcan, and John P. McCrae

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality of these languages. While languages within the same language family share many properties, many under-resourced languages are written in their own native script, which makes taking advantage of these language similarities difficult. In this paper, we propose to alleviate the problem of different scripts by transcribing the native script into common representation i.e. the Latin script or the International Phonetic Alphabet (IPA). In particular, we compare the difference between coarse-grained transliteration to the Latin script and fine-grained IPA transliteration. We performed experiments on the language pairs English-Tamil, English-Telugu, and English-Kannada translation task. Our results show improvements in terms of the BLEU, METEOR and chrF scores from transliteration and we find that the transliteration into the Latin script outperforms the fine-grained IPA transcription.

Cite as

Bharathi Raja Chakravarthi, Mihael Arcan, and John P. McCrae. Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chakravarthi_et_al:OASIcs.LDK.2019.6,
  author =	{Chakravarthi, Bharathi Raja and Arcan, Mihael and McCrae, John P.},
  title =	{{Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.6},
  URN =		{urn:nbn:de:0030-drops-103700},
  doi =		{10.4230/OASIcs.LDK.2019.6},
  annote =	{Keywords: Under-resourced languages, Machine translation, Dravidian languages, Phonetic transcription, Transliteration, International Phonetic Alphabet, IPA, Multilingual machine translation, Multilingual data}
}
Document
CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation

Authors: Christian Chiarcos and Niko Schenk

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
The proper detection of tokens in of running text represents the initial processing step in modular NLP pipelines. But strategies for defining these minimal units can differ, and conflicting analyses of the same text seriously limit the integration of subsequent linguistic annotations into a shared representation. As a solution, we introduce CoNLL Merge, a practical tool for harmonizing TSV-related data models, as they occur, e.g., in multi-layer corpora with non-sequential, concurrent tokenizations, but also in ensemble combinations in Natural Language Processing. CoNLL Merge works unsupervised, requires no manual intervention or external data sources, and comes with a flexible API for fully automated merging routines, validity and sanity checks. Users can chose from several merging strategies, and either preserve a reference tokenization (with possible losses of annotation granularity), create a common tokenization layer consisting of minimal shared subtokens (loss-less in terms of annotation granularity, destructive against a reference tokenization), or present tokenization clashes (loss-less and non-destructive, but introducing empty tokens as place-holders for unaligned elements). We demonstrate the applicability of the tool on two use cases from natural language processing and computational philology.

Cite as

Christian Chiarcos and Niko Schenk. CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chiarcos_et_al:OASIcs.LDK.2019.7,
  author =	{Chiarcos, Christian and Schenk, Niko},
  title =	{{CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{7:1--7:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.7},
  URN =		{urn:nbn:de:0030-drops-103717},
  doi =		{10.4230/OASIcs.LDK.2019.7},
  annote =	{Keywords: data heterogeneity, tokenization, tab-separated values (TSV) format, linguistic annotation, merging}
}
Document
Exploiting Background Knowledge for Argumentative Relation Classification

Authors: Jonathan Kobbe, Juri Opitz, Maria Becker, Ioana Hulpuş, Heiner Stuckenschmidt, and Anette Frank

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Argumentative relation classification is the task of determining the type of relation (e.g., support or attack) that holds between two argument units. Current state-of-the-art models primarily exploit surface-linguistic features including discourse markers, modals or adverbials to classify argumentative relations. However, a system that performs argument analysis using mainly rhetorical features can be easily fooled by the stylistic presentation of the argument as opposed to its content, in cases where a weak argument is concealed by strong rhetorical means. This paper explores the difficulties and the potential effectiveness of knowledge-enhanced argument analysis, with the aim of advancing the state-of-the-art in argument analysis towards a deeper, knowledge-based understanding and representation of arguments. We propose an argumentative relation classification system that employs linguistic as well as knowledge-based features, and investigate the effects of injecting background knowledge into a neural baseline model for argumentative relation classification. Starting from a Siamese neural network that classifies pairs of argument units into support vs. attack relations, we extend this system with a set of features that encode a variety of features extracted from two complementary background knowledge resources: ConceptNet and DBpedia. We evaluate our systems on three different datasets and show that the inclusion of background knowledge can improve the classification performance by considerable margins. Thus, our work offers a first step towards effective, knowledge-rich argument analysis.

Cite as

Jonathan Kobbe, Juri Opitz, Maria Becker, Ioana Hulpuş, Heiner Stuckenschmidt, and Anette Frank. Exploiting Background Knowledge for Argumentative Relation Classification. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kobbe_et_al:OASIcs.LDK.2019.8,
  author =	{Kobbe, Jonathan and Opitz, Juri and Becker, Maria and Hulpu\c{s}, Ioana and Stuckenschmidt, Heiner and Frank, Anette},
  title =	{{Exploiting Background Knowledge for Argumentative Relation Classification}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{8:1--8:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.8},
  URN =		{urn:nbn:de:0030-drops-103723},
  doi =		{10.4230/OASIcs.LDK.2019.8},
  annote =	{Keywords: argument structure analysis, background knowledge, argumentative functions, argument classification, commonsense knowledge relations}
}
Document
Short Paper
Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar

Authors: Christian Chiarcos and Christian Fäth

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
This paper describes the application of annotation engineering techniques for the construction of a corpus for Role and Reference Grammar (RRG). RRG is a semantics-oriented formalism for natural language syntax popular in comparative linguistics and linguistic typology, and predominantly applied for the description of non-European languages which are less-resourced in terms of natural language processing. Because of its cross-linguistic applicability and its conjoint treatment of syntax and semantics, RRG also represents a promising framework for research challenges within natural language processing. At the moment, however, these have not been explored as no RRG corpus data is publicly available. While RRG annotations cannot be easily derived from any single treebank in existence, we suggest that they can be reliably inferred from the intersection of syntactic and semantic annotations as represented by, for example, the Universal Dependencies (UD) and PropBank (PB), and we demonstrate this for the English Web Treebank, a 250,000 token corpus of various genres of English internet text. The resulting corpus is a gold corpus for future experiments in natural language processing in the sense that it is built on existing annotations which have been created manually. A technical challenge in this context is to align UD and PB annotations, to integrate them in a coherent manner, and to distribute and to combine their information on RRG constituent and operator projections. For this purpose, we describe a framework for flexible and scalable annotation engineering based on flexible, unconstrained graph transformations of sentence graphs by means of SPARQL Update.

Cite as

Christian Chiarcos and Christian Fäth. Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 9:1-9:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chiarcos_et_al:OASIcs.LDK.2019.9,
  author =	{Chiarcos, Christian and F\"{a}th, Christian},
  title =	{{Graph-Based Annotation Engineering: Towards a Gold Corpus for Role and Reference Grammar}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{9:1--9:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.9},
  URN =		{urn:nbn:de:0030-drops-103731},
  doi =		{10.4230/OASIcs.LDK.2019.9},
  annote =	{Keywords: Role and Reference Grammar, NLP, Corpus, Semantic Web, LLOD, Syntax, Semantics}
}
Document
Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets

Authors: Omnia Zayed, John P. McCrae, and Paul Buitelaar

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Metaphor is one of the most important elements of human communication, especially in informal settings such as social media. There have been a number of datasets created for metaphor identification, however, this task has proven difficult due to the nebulous nature of metaphoricity. In this paper, we present a crowd-sourcing approach for the creation of a dataset for metaphor identification, that is able to rapidly achieve large coverage over the different usages of metaphor in a given corpus while maintaining high accuracy. We validate this methodology by creating a set of 2,500 manually annotated tweets in English, for which we achieve inter-annotator agreement scores over 0.8, which is higher than other reported results that did not limit the task. This methodology is based on the use of an existing classifier for metaphor in order to assist in the identification and the selection of the examples for annotation, in a way that reduces the cognitive load for annotators and enables quick and accurate annotation. We selected a corpus of both general language tweets and political tweets relating to Brexit and we compare the resulting corpus on these two domains. As a result of this work, we have published the first dataset of tweets annotated for metaphors, which we believe will be invaluable for the development, training and evaluation of approaches for metaphor identification in tweets.

Cite as

Omnia Zayed, John P. McCrae, and Paul Buitelaar. Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 10:1-10:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{zayed_et_al:OASIcs.LDK.2019.10,
  author =	{Zayed, Omnia and McCrae, John P. and Buitelaar, Paul},
  title =	{{Crowd-Sourcing A High-Quality Dataset for Metaphor Identification in Tweets}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{10:1--10:17},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.10},
  URN =		{urn:nbn:de:0030-drops-103740},
  doi =		{10.4230/OASIcs.LDK.2019.10},
  annote =	{Keywords: metaphor, identification, tweets, dataset, annotation, crowd-sourcing}
}
Document
Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications

Authors: Christian Jilek, Markus Schröder, Rudolf Novik, Sven Schwarz, Heiko Maus, and Andreas Dengel

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
A growing number of applications users daily interact with have to operate in (near) real-time: chatbots, digital companions, knowledge work support systems - just to name a few. To perform the services desired by the user, these systems have to analyze user activity logs or explicit user input extremely fast. In particular, text content (e.g. in form of text snippets) needs to be processed in an information extraction task. Regarding the aforementioned temporal requirements, this has to be accomplished in just a few milliseconds, which limits the number of methods that can be applied. Practically, only very fast methods remain, which on the other hand deliver worse results than slower but more sophisticated Natural Language Processing (NLP) pipelines. In this paper, we investigate and propose methods for real-time capable Named Entity Recognition (NER). As a first improvement step, we address word variations induced by inflection, for example present in the German language. Our approach is ontology-based and makes use of several language information sources like Wiktionary. We evaluated it using the German Wikipedia (about 9.4B characters), for which the whole NER process took considerably less than an hour. Since precision and recall are higher than with comparably fast methods, we conclude that the quality gap between high speed methods and sophisticated NLP pipelines can be narrowed a bit more without losing real-time capable runtime performance.

Cite as

Christian Jilek, Markus Schröder, Rudolf Novik, Sven Schwarz, Heiko Maus, and Andreas Dengel. Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{jilek_et_al:OASIcs.LDK.2019.11,
  author =	{Jilek, Christian and Schr\"{o}der, Markus and Novik, Rudolf and Schwarz, Sven and Maus, Heiko and Dengel, Andreas},
  title =	{{Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.11},
  URN =		{urn:nbn:de:0030-drops-103759},
  doi =		{10.4230/OASIcs.LDK.2019.11},
  annote =	{Keywords: Ontology-based information extraction, Named entity recognition, Inflectional languages, Real-time systems}
}
Document
Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study

Authors: Oana Inel and Lora Aroyo

Published in: OASIcs, Volume 70, 2nd Conference on Language, Data and Knowledge (LDK 2019)


Abstract
Event detection is still a difficult task due to the complexity and the ambiguity of such entities. On the one hand, we observe a low inter-annotator agreement among experts when annotating events, disregarding the multitude of existing annotation guidelines and their numerous revisions. On the other hand, event extraction systems have a lower measured performance in terms of F1-score compared to other types of entities such as people or locations. In this paper we study the consistency and completeness of expert-annotated datasets for events and time expressions. We propose a data-agnostic validation methodology of such datasets in terms of consistency and completeness. Furthermore, we combine the power of crowds and machines to correct and extend expert-annotated datasets of events. We show the benefit of using crowd-annotated events to train and evaluate a state-of-the-art event extraction system. Our results show that the crowd-annotated events increase the performance of the system by at least 5.3%.

Cite as

Oana Inel and Lora Aroyo. Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 12:1-12:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{inel_et_al:OASIcs.LDK.2019.12,
  author =	{Inel, Oana and Aroyo, Lora},
  title =	{{Validation Methodology for Expert-Annotated Datasets: Event Annotation Case Study}},
  booktitle =	{2nd Conference on Language, Data and Knowledge (LDK 2019)},
  pages =	{12:1--12:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-105-4},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{70},
  editor =	{Eskevich, Maria and de Melo, Gerard and F\"{a}th, Christian and McCrae, John P. and Buitelaar, Paul and Chiarcos, Christian and Klimek, Bettina and Dojchinovski, Milan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2019.12},
  URN =		{urn:nbn:de:0030-drops-103762},
  doi =		{10.4230/OASIcs.LDK.2019.12},
  annote =	{Keywords: Crowdsourcing, Human-in-the-Loop, Event Extraction, Time Extraction}
}
  • Refine by Author
  • 6 Chiarcos, Christian
  • 4 McCrae, John P.
  • 3 Buitelaar, Paul
  • 3 Fäth, Christian
  • 3 Klimek, Bettina
  • Show More...

  • Refine by Classification
  • 7 Computing methodologies → Natural language processing
  • 5 Computing methodologies → Language resources
  • 5 Information systems → Semantic web description languages
  • 3 Computing methodologies → Knowledge representation and reasoning
  • 3 Information systems → Graph-based database models
  • Show More...

  • Refine by Keyword
  • 3 Linked Data
  • 2 LLOD
  • 2 lemon
  • 2 linguistic annotation
  • 2 linked data
  • Show More...

  • Refine by Type
  • 28 document
  • 1 volume

  • Refine by Publication Year
  • 29 2019

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail