50 Search Results for "P�rez, Jorge A."


Document
Types and Terms Translated: Unrestricted Resources in Encoding Functions as Processes

Authors: Joseph W. N. Paulus, Daniele Nantes-Sobrinho, and Jorge A. Pérez

Published in: LIPIcs, Volume 239, 27th International Conference on Types for Proofs and Programs (TYPES 2021)


Abstract
Type-preserving translations are effective rigorous tools in the study of core programming calculi. In this paper, we develop a new typed translation that connects sequential and concurrent calculi; it is governed by type systems that control resource consumption. Our main contribution is the source language, a new resource λ-calculus with non-collapsing non-determinism and failures, dubbed uλ^{↯}_{⊕}. In uλ^{↯}_{⊕}, resources are split into linear and unrestricted; failures are explicit and arise from this distinction. We define a type system based on intersection types to control resources and fail-prone computation. The target language is 𝗌π, an existing session-typed π-calculus that results from a Curry-Howard correspondence between linear logic and session types. Our typed translation subsumes our prior work; interestingly, it treats unrestricted resources in uλ^{↯}_{⊕} as client-server session behaviours in 𝗌π.

Cite as

Joseph W. N. Paulus, Daniele Nantes-Sobrinho, and Jorge A. Pérez. Types and Terms Translated: Unrestricted Resources in Encoding Functions as Processes. In 27th International Conference on Types for Proofs and Programs (TYPES 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 239, pp. 11:1-11:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{paulus_et_al:LIPIcs.TYPES.2021.11,
  author =	{Paulus, Joseph W. N. and Nantes-Sobrinho, Daniele and P\'{e}rez, Jorge A.},
  title =	{{Types and Terms Translated: Unrestricted Resources in Encoding Functions as Processes}},
  booktitle =	{27th International Conference on Types for Proofs and Programs (TYPES 2021)},
  pages =	{11:1--11:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-254-9},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{239},
  editor =	{Basold, Henning and Cockx, Jesper and Ghilezan, Silvia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.TYPES.2021.11},
  URN =		{urn:nbn:de:0030-drops-167808},
  doi =		{10.4230/LIPIcs.TYPES.2021.11},
  annote =	{Keywords: Resource \lambda-calculus, intersection types, session types, process calculi}
}
Document
Invited Talk
The JeuxDeMots Project (Invited Talk)

Authors: Mathieu Lafourcade

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
The JeuxDeMots project aims at building a very large knowledge base in French, both common sense and specialized, using games, contributory approaches, and inference mechanisms. A dozen games have been designed as part of this project, each one allowing to collect specific information, or to consolidate the information acquired through the other games. With this presentation, the data collected and constructed since the launch of the project in the summer of 2007 will be analyzed both qualitatively and quantitatively. In particular, the following aspects will be detailed: the structure of the lexical and semantic network, some types of relations (semantic, ontological, subjective, semantic roles, associations of ideas), annotation of relations (meta-information), semantic refinements (management of polysemy), the creation of clusters allowing the representation of richer knowledge (n-argument relations) that make an implicit neural network. Finally, I will describe some complementary acquisition methods and applications such as a bot for endogenous contributions, a chatbot making inferences and semantic extraction from texts.

Cite as

Mathieu Lafourcade. The JeuxDeMots Project (Invited Talk). In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{lafourcade:OASIcs.LDK.2021.1,
  author =	{Lafourcade, Mathieu},
  title =	{{The JeuxDeMots Project}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{1:1--1:1},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.1},
  URN =		{urn:nbn:de:0030-drops-145377},
  doi =		{10.4230/OASIcs.LDK.2021.1},
  annote =	{Keywords: Lexical Semantic Network, Games with a Purpose, Inferences, Knowledge Representation, Semantic Representation}
}
Document
Invited Talk
A Smell is Worth a Thousand Words: Olfactory Information Extraction and Semantic Processing in a Multilingual Perspective (Invited Talk)

Authors: Sara Tonelli

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
More than any other sense, smell is linked directly to our emotions and our memories. However, smells are intangible and very difficult to preserve, making it hard to effectively identify, consolidate, and promote the wide-ranging role scents and smelling have in our cultural heritage. While some novel approaches have been recently proposed to monitor so-called urban smellscapes and analyse the olfactory dimension of our environments (Quercia et al., 2015), when it comes to smellscapes from the past little research has been done to keep track of how places, events and people have been described from an olfactory perspective. Fortunately, some key prerequisites for addressing this problem are now in place. In recent years, European cultural heritage institutions have invested heavily in large-scale digitisation: we hold a wealth of object, text and image data which can now be analysed using artificial intelligence. What remains missing is a methodology for the extraction of scent-related information from large amounts of texts, as well as a broader awareness of the wealth of historical olfactory descriptions, experiences and memories contained within the heritage datasets. In this talk, I will describe ongoing activities towards this goal, focused on text mining and semantic processing of olfactory information. I will present the general framework designed to annotate smell events in documents, and some preliminary results on information extraction approaches in a multilingual scenario. I will discuss the main findings and the challenges related to modelling textual descriptions of smells, including the metaphorical use of smell-related terms and the well-known limitations of smell vocabulary in European languages compared to other senses.

Cite as

Sara Tonelli. A Smell is Worth a Thousand Words: Olfactory Information Extraction and Semantic Processing in a Multilingual Perspective (Invited Talk). In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, p. 2:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{tonelli:OASIcs.LDK.2021.2,
  author =	{Tonelli, Sara},
  title =	{{A Smell is Worth a Thousand Words: Olfactory Information Extraction and Semantic Processing in a Multilingual Perspective}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{2:1--2:1},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.2},
  URN =		{urn:nbn:de:0030-drops-145386},
  doi =		{10.4230/OASIcs.LDK.2021.2},
  annote =	{Keywords: olfactory information extraction, smellscapes, multilingual annotation}
}
Document
Invited Talk
Free/Open-Source Machine Translation for the Low-Resource Languages of Spain (Invited Talk)

Authors: Mikel L. Forcada

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
While machine translation has historically been rule-based, that is, based on dictionaries and rules written by experts, most present-day machine translation is corpus-based. In the last few years, statistical machine translation, the dominant corpus-based approach, has been displaced by neural machine translation in most applications, in view of the better results reported, particularly for languages with very different syntax. But both statistical and neural machine translation need to be trained on large amounts of parallel data, that is, sentences in one language carefully paired with their translations in their other language, and this is a resource that may not be available for some low-resource languages. While some of the languages of Spain may be considered to be reasonably endowed with parallel corpora connecting them to Spanish or even to English - Basque, Catalan, Galician -, and are well-served with machine translation systems, there are many other languages which cannot afford them such as Aranese Occitan, Aragonese, or Asturian/Leonese. Fortunately, languages in this last group belong to the Romance language family, as Spanish does, and this makes translation from and into Spanish under a rule-based paradigm the only feasible approach. After describing briefly the main machine translation paradigms, I will describe the Apertium free/open-source rule-based machine translation platform, which has been used to build machine translation systems for these low-resource languages of Spain, indeed, sometimes the only ones available. The free/open-source setting has made linguistic data for these languages available for anyone in their linguistic communities to build other linguistic technologies for these low-resourced languages. For example, the Apertium family of bilingual and monolingual data has been converted into RDF and they have been made accessible on the Web as linked data.

Cite as

Mikel L. Forcada. Free/Open-Source Machine Translation for the Low-Resource Languages of Spain (Invited Talk). In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, p. 3:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{forcada:OASIcs.LDK.2021.3,
  author =	{Forcada, Mikel L.},
  title =	{{Free/Open-Source Machine Translation for the Low-Resource Languages of Spain}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{3:1--3:1},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.3},
  URN =		{urn:nbn:de:0030-drops-145399},
  doi =		{10.4230/OASIcs.LDK.2021.3},
  annote =	{Keywords: free/open-source, machine translation, languages of Spain, low-resource machine translation}
}
Document
Crazy New Idea
A Computational Simulation of Children’s Language Acquisition (Crazy New Idea)

Authors: Ben Ambridge

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
Many modern NLP models are already close to simulating children’s language acquisition; the main thing they currently lack is a "real world" representation of semantics that allows them to map from form to meaning and vice-versa. The aim of this "Crazy Idea" is to spark a discussion about how we might get there.

Cite as

Ben Ambridge. A Computational Simulation of Children’s Language Acquisition (Crazy New Idea). In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 4:1-4:3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{ambridge:OASIcs.LDK.2021.4,
  author =	{Ambridge, Ben},
  title =	{{A Computational Simulation of Children’s Language Acquisition}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{4:1--4:3},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.4},
  URN =		{urn:nbn:de:0030-drops-145402},
  doi =		{10.4230/OASIcs.LDK.2021.4},
  annote =	{Keywords: Child language acquisition, language development, deep learning, BERT, ELMo, GPT-3}
}
Document
Crazy New Idea
Get! Mimetypes! Right! (Crazy New Idea)

Authors: Christian Chiarcos

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
This paper identifies three technical requirements - availability of data, sustainable hosting and resolvable URIs for hosted data - as minimal pre-conditions for Linguistic Linked Open Data technology to develop towards a mature technological ecosystem that third party applications can build upon. While a critical amount of data is available (and it continues to grow), there does not seem to exist a hosting solution that combines the prospects of long-term availability with an unrestricted capability to support resolvable URIs. In particular, data hosting services do currently not allow data to be declared as RDF content by means of their media type (mime type), so that the capability of clients to recognize formats and to resolve URIs on that basis is severely limited.

Cite as

Christian Chiarcos. Get! Mimetypes! Right! (Crazy New Idea). In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 5:1-5:4, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{chiarcos:OASIcs.LDK.2021.5,
  author =	{Chiarcos, Christian},
  title =	{{Get! Mimetypes! Right!}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{5:1--5:4},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.5},
  URN =		{urn:nbn:de:0030-drops-145418},
  doi =		{10.4230/OASIcs.LDK.2021.5},
  annote =	{Keywords: data hosting, mimetypes, resolvability, URIs, Linked Data foundations}
}
Document
Crazy New Idea
Mind the Gap: Language Data, Their Producers, and the Scientific Process (Crazy New Idea)

Authors: Tobias Weber

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
This paper discusses the role of low-resource languages in NLP through the lens of different stakeholders. It argues that the current "consumerist approach" to language data reinforces a vicious circle which increases the technological exclusion of minority communities. Researchers' decisions directly affect these processes to the detriment of minorities and practitioners engaging in language work in these communities. In line with the conference topic, the paper concludes with strategies and prerequisites for creating a positive feedback loop in our research benefiting language work within the next decade.

Cite as

Tobias Weber. Mind the Gap: Language Data, Their Producers, and the Scientific Process (Crazy New Idea). In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 6:1-6:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{weber:OASIcs.LDK.2021.6,
  author =	{Weber, Tobias},
  title =	{{Mind the Gap: Language Data, Their Producers, and the Scientific Process}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{6:1--6:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.6},
  URN =		{urn:nbn:de:0030-drops-145424},
  doi =		{10.4230/OASIcs.LDK.2021.6},
  annote =	{Keywords: minority languages, data integration, sociology of technology, documentary linguistics, exclusion}
}
Document
Representing the Under-Represented: a Dataset of Post-Colonial, and Migrant Writers

Authors: Marco Antonio Stranisci, Viviana Patti, and Rossana Damiano

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
In today’s media and in the Web of Data, non-Western people still suffer a lack of representation. In our work, we address this issue by presenting a pipeline for collecting and semantically encoding Wikipedia biographies of writers who are under-represented due to their non-Western origins, or their legal status in a country. The two main components of the ontology will be described, together with a framework for mapping textual biographies to their corresponding semantic representations. A description of the data set, and some examples of biographical texts conversion to the Ontology Classes, will be provided.

Cite as

Marco Antonio Stranisci, Viviana Patti, and Rossana Damiano. Representing the Under-Represented: a Dataset of Post-Colonial, and Migrant Writers. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{stranisci_et_al:OASIcs.LDK.2021.7,
  author =	{Stranisci, Marco Antonio and Patti, Viviana and Damiano, Rossana},
  title =	{{Representing the Under-Represented: a Dataset of Post-Colonial, and Migrant Writers}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{7:1--7:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.7},
  URN =		{urn:nbn:de:0030-drops-145431},
  doi =		{10.4230/OASIcs.LDK.2021.7},
  annote =	{Keywords: Ontologies, Knowledge Graph, Language Resources, Migrations}
}
Document
Plenary Debates of the Parliament of Finland as Linked Open Data and in Parla-CLARIN Markup

Authors: Laura Sinikallio, Senka Drobac, Minna Tamper, Rafael Leal, Mikko Koho, Jouni Tuominen, Matti La Mela, and Eero Hyvönen

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
This paper presents a knowledge graph created by transforming the plenary debates of the Parliament of Finland (1907-) into Linked Open Data (LOD). The data, totaling over νm{900 000} speeches, with automatically created semantic annotations and rich ontology-based metadata, are published in a Linked Open Data Service and are used via a SPARQL API and as data dumps. The speech data is part of larger LOD publication FinnParla that also includes prosopographical data about the politicians. The data is being used for studying parliamentary language and culture in Digital Humanities in several universities. To serve a wider variety of users, the entirety of this data was also produced using Parla-CLARIN markup. We present the first publication of all Finnish parliamentary debates as data. Technical novelties in our approach include the use of both Parla-CLARIN and an RDF schema developed for representing the speeches, integration of the data to a new Parliament of Finland Ontology for deeper data analyses, and enriching the data with a variety of external national and international data sources.

Cite as

Laura Sinikallio, Senka Drobac, Minna Tamper, Rafael Leal, Mikko Koho, Jouni Tuominen, Matti La Mela, and Eero Hyvönen. Plenary Debates of the Parliament of Finland as Linked Open Data and in Parla-CLARIN Markup. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 8:1-8:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{sinikallio_et_al:OASIcs.LDK.2021.8,
  author =	{Sinikallio, Laura and Drobac, Senka and Tamper, Minna and Leal, Rafael and Koho, Mikko and Tuominen, Jouni and La Mela, Matti and Hyv\"{o}nen, Eero},
  title =	{{Plenary Debates of the Parliament of Finland as Linked Open Data and in Parla-CLARIN Markup}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{8:1--8:17},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.8},
  URN =		{urn:nbn:de:0030-drops-145444},
  doi =		{10.4230/OASIcs.LDK.2021.8},
  annote =	{Keywords: Plenary debates, parliamentary data, Parla-CLARIN, Linked Open Data, Digital Humanities}
}
Document
Towards a Corpus of Historical German Plays with Emotion Annotations

Authors: Thomas Schmidt, Katrin Dennerlein, and Christian Wolff

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
In this paper, we present first work-in-progress annotation results of a project investigating computational methods of emotion analysis for historical German plays around 1800. We report on the development of an annotation scheme focussing on the annotation of emotions that are important from a literary studies perspective for this time span as well as on the annotation process we have developed. We annotate emotions expressed or attributed by characters of the plays in the written texts. The scheme consists of 13 hierarchically structured emotion concepts as well as the source (who experiences or attributes the emotion) and target (who or what is the emotion directed towards). We have conducted the annotation of five example plays of our corpus with two annotators per play and report on annotation distributions and agreement statistics. We were able to collect over 6,500 emotion annotations and identified a fair agreement for most concepts around a κ-value of 0.4. We discuss how we plan to improve annotator consistency and continue our work. The results also have implications for similar projects in the context of Digital Humanities.

Cite as

Thomas Schmidt, Katrin Dennerlein, and Christian Wolff. Towards a Corpus of Historical German Plays with Emotion Annotations. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 9:1-9:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:OASIcs.LDK.2021.9,
  author =	{Schmidt, Thomas and Dennerlein, Katrin and Wolff, Christian},
  title =	{{Towards a Corpus of Historical German Plays with Emotion Annotations}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{9:1--9:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.9},
  URN =		{urn:nbn:de:0030-drops-145459},
  doi =		{10.4230/OASIcs.LDK.2021.9},
  annote =	{Keywords: Emotion, Annotation, Digital Humanities, Computational Literary Studies, German Drama, Sentiment Analysis, Emotion Analysis, Corpus}
}
Document
Enriching a Lexical Resource for French Verbs with Aspectual Information

Authors: Anna Kupść, Pauline Haas, Rafael Marín, and Antonio Balvet

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
The paper presents a syntactico-semantic lexicon of over a thousand French verbs. It has been created by manually adding lexical aspect features to verb frames from TreeLex [Kupść and Abeillé, 2008]. We present how the original syntactic resource has been adapted to the current project, our aspect assignment procedure and an overview of the resulting lexical resource.

Cite as

Anna Kupść, Pauline Haas, Rafael Marín, and Antonio Balvet. Enriching a Lexical Resource for French Verbs with Aspectual Information. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 10:1-10:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{kupsc_et_al:OASIcs.LDK.2021.10,
  author =	{Kup\'{s}\'{c}, Anna and Haas, Pauline and Mar{\'\i}n, Rafael and Balvet, Antonio},
  title =	{{Enriching a Lexical Resource for French Verbs with Aspectual Information}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{10:1--10:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.10},
  URN =		{urn:nbn:de:0030-drops-145460},
  doi =		{10.4230/OASIcs.LDK.2021.10},
  annote =	{Keywords: computational semantics, corpora-based methods in language engineering, electronic language resources and tools, formalization of natural languages}
}
Document
Annotation of Fine-Grained Geographical Entities in German Texts

Authors: Julián Moreno-Schneider, Melina Plakidis, and Georg Rehm

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
We work on the creation of a corpus, crawled from the internet, on the Berlin district of Moabit, primarily meant for training NER systems in German and English. Typical NER corpora and corresponding systems distinguish persons, organisations and locations, but do not distinguish different types of location entities. For our tourism-inspired use case, we need fine-grained annotations for toponyms. In this paper, we outline the fine-grained classification of geographical entities, the resulting annotations and we present preliminary results on automatically tagging toponyms in a small, bootstrapped gold corpus.

Cite as

Julián Moreno-Schneider, Melina Plakidis, and Georg Rehm. Annotation of Fine-Grained Geographical Entities in German Texts. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 11:1-11:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{morenoschneider_et_al:OASIcs.LDK.2021.11,
  author =	{Moreno-Schneider, Juli\'{a}n and Plakidis, Melina and Rehm, Georg},
  title =	{{Annotation of Fine-Grained Geographical Entities in German Texts}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{11:1--11:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.11},
  URN =		{urn:nbn:de:0030-drops-145473},
  doi =		{10.4230/OASIcs.LDK.2021.11},
  annote =	{Keywords: Named Entity Recognition, Geographical Entities, Annotation}
}
Document
A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian

Authors: Danka Jokić, Ranka Stanković, Cvetana Krstev, and Branislava Šandrih

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
Abusive speech in social media, including profanities, derogatory and hate speech, has reached the level of a pandemic. A system that would be able to detect such texts could help in making the Internet and social media a better and more respectful virtual space. Research and commercial application in this area were so far focused mainly on the English language. This paper presents the work on building AbCoSER, the first corpus of abusive speech in Serbian. The corpus consists of 6,436 manually annotated tweets, out of which 1,416 were labelled as tweets using some kind of abusive speech. Those 1,416 tweets were further sub-classified, for instance to those using vulgar, hate speech, derogatory language, etc. In this paper, we explain the process of data acquisition, annotation, and corpus construction. We also discuss the results of an initial analysis of the annotation quality. Finally, we present an abusive speech lexicon structure and its enrichment with abusive triggers extracted from the AbCoSER dataset.

Cite as

Danka Jokić, Ranka Stanković, Cvetana Krstev, and Branislava Šandrih. A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{jokic_et_al:OASIcs.LDK.2021.13,
  author =	{Joki\'{c}, Danka and Stankovi\'{c}, Ranka and Krstev, Cvetana and \v{S}andrih, Branislava},
  title =	{{A Twitter Corpus and Lexicon for Abusive Speech Detection in Serbian}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{13:1--13:17},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.13},
  URN =		{urn:nbn:de:0030-drops-145493},
  doi =		{10.4230/OASIcs.LDK.2021.13},
  annote =	{Keywords: abusive language, hate speech, Serbian, Twitter, lexicon, corpus}
}
Document
Bias in Knowledge Graphs - An Empirical Study with Movie Recommendation and Different Language Editions of DBpedia

Authors: Michael Matthias Voit and Heiko Paulheim

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
Public knowledge graphs such as DBpedia and Wikidata have been recognized as interesting sources of background knowledge to build content-based recommender systems. They can be used to add information about the items to be recommended and links between those. While quite a few approaches for exploiting knowledge graphs have been proposed, most of them aim at optimizing the recommendation strategy while using a fixed knowledge graph. In this paper, we take a different approach, i.e., we fix the recommendation strategy and observe changes when using different underlying knowledge graphs. Particularly, we use different language editions of DBpedia. We show that the usage of different knowledge graphs does not only lead to differently biased recommender systems, but also to recommender systems that differ in performance for particular fields of recommendations.

Cite as

Michael Matthias Voit and Heiko Paulheim. Bias in Knowledge Graphs - An Empirical Study with Movie Recommendation and Different Language Editions of DBpedia. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 14:1-14:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{voit_et_al:OASIcs.LDK.2021.14,
  author =	{Voit, Michael Matthias and Paulheim, Heiko},
  title =	{{Bias in Knowledge Graphs - An Empirical Study with Movie Recommendation and Different Language Editions of DBpedia}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{14:1--14:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.14},
  URN =		{urn:nbn:de:0030-drops-145506},
  doi =		{10.4230/OASIcs.LDK.2021.14},
  annote =	{Keywords: Knowledge Graph, DBpedia, Recommender Systems, Bias, Language Bias, RDF2vec}
}
Document
Enriching Word Embeddings with Food Knowledge for Ingredient Retrieval

Authors: Álvaro Mendes Samagaio, Henrique Lopes Cardoso, and David Ribeiro

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
Smart assistants and recommender systems must deal with lots of information coming from different sources and having different formats. This is more frequent in text data, which presents increased variability and complexity, and is rather common for conversational assistants or chatbots. Moreover, this issue is very evident in the food and nutrition lexicon, where the semantics present increased variability, namely due to hypernyms and hyponyms. This work describes the creation of a set of word embeddings based on the incorporation of information from a food thesaurus - LanguaL - through retrofitting. The ingredients were classified according to three different facet label groups. Retrofitted embeddings seem to properly encode food-specific knowledge, as shown by an increase on accuracy as compared to generic embeddings (+23%, +10% and +31% per group). Moreover, a weighing mechanism based on TF-IDF was applied to embedding creation before retrofitting, also bringing an increase on accuracy (+5%, +9% and +5% per group). Finally, the approach has been tested with human users in an ingredient retrieval exercise, showing very positive evaluation (77.3% of the volunteer testers preferred this method over a string-based matching algorithm).

Cite as

Álvaro Mendes Samagaio, Henrique Lopes Cardoso, and David Ribeiro. Enriching Word Embeddings with Food Knowledge for Ingredient Retrieval. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{samagaio_et_al:OASIcs.LDK.2021.15,
  author =	{Samagaio, \'{A}lvaro Mendes and Lopes Cardoso, Henrique and Ribeiro, David},
  title =	{{Enriching Word Embeddings with Food Knowledge for Ingredient Retrieval}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{15:1--15:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.15},
  URN =		{urn:nbn:de:0030-drops-145510},
  doi =		{10.4230/OASIcs.LDK.2021.15},
  annote =	{Keywords: Word embeddings, Retrofitting, LanguaL, Food Embeddings, Knowledge Graph}
}
  • Refine by Author
  • 6 Pérez, Jorge A.
  • 3 Chiarcos, Christian
  • 3 Ionov, Maxim
  • 2 Arslanagić, Alen
  • 2 Baumann, Andreas
  • Show More...

  • Refine by Classification
  • 11 Computing methodologies → Language resources
  • 9 Computing methodologies → Information extraction
  • 6 Computing methodologies → Lexical semantics
  • 6 Computing methodologies → Natural language processing
  • 5 Theory of computation → Process calculi
  • Show More...

  • Refine by Keyword
  • 4 Knowledge Graph
  • 4 NLP
  • 3 Annotation
  • 3 Digital Humanities
  • 3 Machine Learning
  • Show More...

  • Refine by Type
  • 50 document

  • Refine by Publication Year
  • 41 2021
  • 3 2019
  • 2 2020
  • 1 2013
  • 1 2014
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail