DROPS

Volume

OASIcs, Volume 120

13th Symposium on Languages, Applications and Technologies (SLATE 2024)

SLATE 2024, July 4-5, 2024, Águeda, Portugal

Editors: Mário Rodrigues, José Paulo Leal, and Filipe Portela

Document

DOI: 10.4230/OASIcs.SLATE.2025.3

Elements for Weighted Answer-Set Programming

Authors: Francisco Coelho, Bruno Dinis, Dietmar Seipel, and Salvador Abreu

Published in: OASIcs, Volume 135, 14th Symposium on Languages, Applications and Technologies (SLATE 2025)

Abstract

Logic programs, more specifically, answer-set programs, can be annotated with probabilities on facts to express uncertainty. We address the problem of propagating weight annotations on facts (e.g. probabilities) of an answer-set program to its stable models, and from there to events (defined as sets of atoms) in a dataset over the program’s domain. We propose a novel approach which is algebraic in the sense that it relies on an equivalence relation over the set of events. Uncertainty is then described as polynomial expressions over variables. We propagate the weight function in the space of models and events, rather than doing so within the syntax of the program. As evidence that our approach is sound, we show that certain facts behave as expected. Our approach allows us to investigate weight annotated programs and to determine how suitable a given one is for modeling a given dataset containing events. It’s core is illustrated by a running example and the encoding of a Bayesian network.

Cite as

Francisco Coelho, Bruno Dinis, Dietmar Seipel, and Salvador Abreu. Elements for Weighted Answer-Set Programming. In 14th Symposium on Languages, Applications and Technologies (SLATE 2025). Open Access Series in Informatics (OASIcs), Volume 135, pp. 3:1-3:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{coelho_et_al:OASIcs.SLATE.2025.3,
  author =	{Coelho, Francisco and Dinis, Bruno and Seipel, Dietmar and Abreu, Salvador},
  title =	{{Elements for Weighted Answer-Set Programming}},
  booktitle =	{14th Symposium on Languages, Applications and Technologies (SLATE 2025)},
  pages =	{3:1--3:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-387-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{135},
  editor =	{Baptista, Jorge and Barateiro, Jos\'{e}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2025.3},
  URN =		{urn:nbn:de:0030-drops-236836},
  doi =		{10.4230/OASIcs.SLATE.2025.3},
  annote =	{Keywords: Answer-Set Programming, Stable Models, Probabilistic Logic Programming}
}

Document

Invited Talk

DOI: 10.4230/LIPIcs.FSCD.2025.2

Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic Programs (Invited Talk)

Authors: Matthew L. Daggitt, Wen Kokke, Robert Atkey, Ekaterina Komendantskaya, Natalia Slusarz, and Luca Arnaboldi

Published in: LIPIcs, Volume 337, 10th International Conference on Formal Structures for Computation and Deduction (FSCD 2025)

Abstract

Neuro-symbolic programs, i.e. programs containing both machine learning components and traditional symbolic code, are becoming increasingly widespread. Finding a general methodology for verifying such programs is challenging due to both the number of different tools involved and the intricate interface between the "neural" and "symbolic" program components. In this paper we present a general decomposition of the neuro-symbolic verification problem into parts, and examine the problem of the embedding gap that occurs when one tries to combine proofs about the neural and symbolic components. To address this problem we then introduce Vehicle - standing as an abbreviation for a "verification condition language" - an intermediate programming language interface between machine learning frameworks, automated theorem provers, and dependently-typed formalisations of neuro-symbolic programs. Vehicle allows users to specify the properties of the neural components of neuro-symbolic programs once, and then safely compile the specification to each interface using a tailored typing and compilation procedure. We give a high-level overview of Vehicle’s overall design, its interfaces and compilation & type-checking procedures, and then demonstrate its utility by formally verifying the safety of a simple autonomous car controlled by a neural network, operating in a stochastic environment with imperfect information.

Cite as

Matthew L. Daggitt, Wen Kokke, Robert Atkey, Ekaterina Komendantskaya, Natalia Slusarz, and Luca Arnaboldi. Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic Programs (Invited Talk). In 10th International Conference on Formal Structures for Computation and Deduction (FSCD 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 337, pp. 2:1-2:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{daggitt_et_al:LIPIcs.FSCD.2025.2,
  author =	{Daggitt, Matthew L. and Kokke, Wen and Atkey, Robert and Komendantskaya, Ekaterina and Slusarz, Natalia and Arnaboldi, Luca},
  title =	{{Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic Programs}},
  booktitle =	{10th International Conference on Formal Structures for Computation and Deduction (FSCD 2025)},
  pages =	{2:1--2:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-374-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{337},
  editor =	{Fern\'{a}ndez, Maribel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FSCD.2025.2},
  URN =		{urn:nbn:de:0030-drops-236172},
  doi =		{10.4230/LIPIcs.FSCD.2025.2},
  annote =	{Keywords: Neural Network Verification, Types, Interactive Theorem Provers}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2025.23

Wastrumentation: Portable WebAssembly Dynamic Analysis with Support for Intercession

Authors: Aäron Munsters, Angel Luis Scull Pupo, and Elisa Gonzalez Boix

Published in: LIPIcs, Volume 333, 39th European Conference on Object-Oriented Programming (ECOOP 2025)

Abstract

Dynamic program analyses help in understanding a program’s runtime behavior and detect issues related to security, program comprehension, or profiling. Instrumentation platforms aid analysis developers by offering a high-level API to write the analysis, and inserting the analysis into the target program. However, current instrumentation platforms for WebAssembly (Wasm) restrict analysis portability because they require concrete runtime environments. Moreover, their analysis API only allows the development of analyses that observe the target program but cannot modify it. As a result, many popular dynamic analyses present for other languages, such as runtime hardening, virtual patching or runtime optimization, cannot currently be implemented for Wasm atop a dynamic analysis platform. Instead, they need to be built manually, which requires knowledge of low-level details of the Wasm’s semantics and instruction set, and how to safely manipulate it. This paper introduces Wastrumentation, the first dynamic analysis platform for WebAssembly that supports intercession. Our solution, based on source code instrumentation, weaves the analysis code directly into the target program code. Inlining the analysis into the target’s source code avoids dependencies on the runtime environment, making analyses portable across Wasm VMs. Moreover, it enables the implementation of analyses in any Wasm-compatible language. We evaluate our solution in two ways. First, we compare it against a state-of-the-art source code instrumentation platform using the WasmR3 benchmarks. The results show improved memory consumption and competitive performance overhead. Second, we develop an extensive portfolio of dynamic analyses, including novel analyses previously unattainable with source code instrumentation platforms, such as memoization, safe heap access, and the removal of NaN non-determinism.

Cite as

Aäron Munsters, Angel Luis Scull Pupo, and Elisa Gonzalez Boix. Wastrumentation: Portable WebAssembly Dynamic Analysis with Support for Intercession. In 39th European Conference on Object-Oriented Programming (ECOOP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 333, pp. 23:1-23:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{munsters_et_al:LIPIcs.ECOOP.2025.23,
  author =	{Munsters, A\"{a}ron and Scull Pupo, Angel Luis and Gonzalez Boix, Elisa},
  title =	{{Wastrumentation: Portable WebAssembly Dynamic Analysis with Support for Intercession}},
  booktitle =	{39th European Conference on Object-Oriented Programming (ECOOP 2025)},
  pages =	{23:1--23:29},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-373-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{333},
  editor =	{Aldrich, Jonathan and Silva, Alexandra},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2025.23},
  URN =		{urn:nbn:de:0030-drops-233153},
  doi =		{10.4230/LIPIcs.ECOOP.2025.23},
  annote =	{Keywords: WebAssembly, dynamic analysis, instrumentation platform, intercession}
}

Document

Complete Volume

DOI: 10.4230/OASIcs.SLATE.2024

OASIcs, Volume 120, SLATE 2024, Complete Volume

Authors: Mário Rodrigues, José Paulo Leal, and Filipe Portela

Published in: OASIcs, Volume 120, 13th Symposium on Languages, Applications and Technologies (SLATE 2024)

Abstract

OASIcs, Volume 120, SLATE 2024, Complete Volume

Cite as

13th Symposium on Languages, Applications and Technologies (SLATE 2024). Open Access Series in Informatics (OASIcs), Volume 120, pp. 1-186, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Proceedings{rodrigues_et_al:OASIcs.SLATE.2024,
  title =	{{OASIcs, Volume 120, SLATE 2024, Complete Volume}},
  booktitle =	{13th Symposium on Languages, Applications and Technologies (SLATE 2024)},
  pages =	{1--186},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-321-8},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{120},
  editor =	{Rodrigues, M\'{a}rio and Leal, Jos\'{e} Paulo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2024},
  URN =		{urn:nbn:de:0030-drops-220911},
  doi =		{10.4230/OASIcs.SLATE.2024},
  annote =	{Keywords: OASIcs, Volume 120, SLATE 2024, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/OASIcs.SLATE.2024.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Mário Rodrigues, José Paulo Leal, and Filipe Portela

Published in: OASIcs, Volume 120, 13th Symposium on Languages, Applications and Technologies (SLATE 2024)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

13th Symposium on Languages, Applications and Technologies (SLATE 2024). Open Access Series in Informatics (OASIcs), Volume 120, pp. 0:i-0:xii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2024.0,
  author =	{Rodrigues, M\'{a}rio and Leal, Jos\'{e} Paulo and Portela, Filipe},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{13th Symposium on Languages, Applications and Technologies (SLATE 2024)},
  pages =	{0:i--0:xii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-321-8},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{120},
  editor =	{Rodrigues, M\'{a}rio and Leal, Jos\'{e} Paulo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2024.0},
  URN =		{urn:nbn:de:0030-drops-220906},
  doi =		{10.4230/OASIcs.SLATE.2024.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/OASIcs.SLATE.2024.1

Using Embeddings to Improve Named Entity Recognition Classification with Graphs

Authors: Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim

Published in: OASIcs, Volume 120, 13th Symposium on Languages, Applications and Technologies (SLATE 2024)

Abstract

Richer information has potential to improve performance of NLP (Natural Language Processing) tasks such as Named Entity Recognition. A linear sequence of words can be enriched with the sentence structure, as well as their syntactic structure. However, traditional NLP methods do not contemplate this kind of information. With the use of Knowledge Graphs all this information can be represented and made use off by Graph ML (Machine Learning) techniques. Previous experiments using only graphs with their syntactic structure as input to current state-of-the-art Graph ML models failed to prove the potential of the technology. As such, in this paper the use of word embeddings is explored as an additional enrichment of the graph and, in consequence, of the input to the classification models. This use of embeddings adds a layer of context that was previously missing when using only syntactic information. The proposed method was assessed using CoNLL dataset and results showed noticeable improvements in performance when adding embeddings. The best accuracy results with embedings attained 94.73 % accuracy, compared to the 88.58 % without embedings while metrics such as Macro-F1, Precision and Recall achieved an improvement in performance of over 20%. We test these models with a different number of classes to assess whether the quality of them would degrade or not. Due to the use of inductive learning methods (such as Graph SAGE) these results provide us with models that can be used in real-world scenarios as there is no need to re-train the whole graph to predict on new data points as is the case with traditional Graph ML methods (for example, Graph Convolutional Networks).

Cite as

Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim. Using Embeddings to Improve Named Entity Recognition Classification with Graphs. In 13th Symposium on Languages, Applications and Technologies (SLATE 2024). Open Access Series in Informatics (OASIcs), Volume 120, pp. 1:1-1:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{silva_et_al:OASIcs.SLATE.2024.1,
  author =	{Silva, Gabriel and Rodrigues, M\'{a}rio and Teixeira, Ant\'{o}nio and Amorim, Marlene},
  title =	{{Using Embeddings to Improve Named Entity Recognition Classification with Graphs}},
  booktitle =	{13th Symposium on Languages, Applications and Technologies (SLATE 2024)},
  pages =	{1:1--1:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-321-8},
  ISSN =	{2190-6807},
  year =	{2024},
  volume =	{120},
  editor =	{Rodrigues, M\'{a}rio and Leal, Jos\'{e} Paulo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2024.1},
  URN =		{urn:nbn:de:0030-drops-220722},
  doi =		{10.4230/OASIcs.SLATE.2024.1},
  annote =	{Keywords: Knowledge graphs, Enriched data, Natural language processing, Named Entity Recognition}
}

Document

Position

DOI: 10.4230/TGDK.2.1.2

Grounding Stream Reasoning Research

Authors: Pieter Bonte, Jean-Paul Calbimonte, Daniel de Leng, Daniele Dell'Aglio, Emanuele Della Valle, Thomas Eiter, Federico Giannini, Fredrik Heintz, Konstantin Schekotihin, Danh Le-Phuoc, Alessandra Mileo, Patrik Schneider, Riccardo Tommasini, Jacopo Urbani, and Giacomo Ziffer

Published in: TGDK, Volume 2, Issue 1 (2024): Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge, Volume 2, Issue 1

Abstract

In the last decade, there has been a growing interest in applying AI technologies to implement complex data analytics over data streams. To this end, researchers in various fields have been organising a yearly event called the "Stream Reasoning Workshop" to share perspectives, challenges, and experiences around this topic. In this paper, the previous organisers of the workshops and other community members provide a summary of the main research results that have been discussed during the first six editions of the event. These results can be categorised into four main research areas: The first is concerned with the technological challenges related to handling large data streams. The second area aims at adapting and extending existing semantic technologies to data streams. The third and fourth areas focus on how to implement reasoning techniques, either considering deductive or inductive techniques, to extract new and valuable knowledge from the data in the stream. This summary is written not only to provide a crystallisation of the field, but also to point out distinctive traits of the stream reasoning community. Moreover, it also provides a foundation for future research by enumerating a list of use cases and open challenges, to stimulate others to join this exciting research area.

Cite as

Pieter Bonte, Jean-Paul Calbimonte, Daniel de Leng, Daniele Dell'Aglio, Emanuele Della Valle, Thomas Eiter, Federico Giannini, Fredrik Heintz, Konstantin Schekotihin, Danh Le-Phuoc, Alessandra Mileo, Patrik Schneider, Riccardo Tommasini, Jacopo Urbani, and Giacomo Ziffer. Grounding Stream Reasoning Research. In Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 1, pp. 2:1-2:47, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{bonte_et_al:TGDK.2.1.2,
  author =	{Bonte, Pieter and Calbimonte, Jean-Paul and de Leng, Daniel and Dell'Aglio, Daniele and Della Valle, Emanuele and Eiter, Thomas and Giannini, Federico and Heintz, Fredrik and Schekotihin, Konstantin and Le-Phuoc, Danh and Mileo, Alessandra and Schneider, Patrik and Tommasini, Riccardo and Urbani, Jacopo and Ziffer, Giacomo},
  title =	{{Grounding Stream Reasoning Research}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{2:1--2:47},
  ISSN =	{2942-7517},
  year =	{2024},
  volume =	{2},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.2.1.2},
  URN =		{urn:nbn:de:0030-drops-198597},
  doi =		{10.4230/TGDK.2.1.2},
  annote =	{Keywords: Stream Reasoning, Stream Processing, RDF streams, Streaming Linked Data, Continuous query processing, Temporal Logics, High-performance computing, Databases}
}

@Article{bonte_et_al:TGDK.2.1.2,
  author =	{Bonte, Pieter and Calbimonte, Jean-Paul and de Leng, Daniel and Dell'Aglio, Daniele and Della Valle, Emanuele and Eiter, Thomas and Giannini, Federico and Heintz, Fredrik and Schekotihin, Konstantin and Le-Phuoc, Danh and Mileo, Alessandra and Schneider, Patrik and Tommasini, Riccardo and Urbani, Jacopo and Ziffer, Giacomo},
  title =	{{Grounding Stream Reasoning Research}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{2:1--2:47},
  ISSN =	{2942-7517},
  year =	{2024},
  volume =	{2},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.2.1.2},
  URN =		{urn:nbn:de:0030-drops-198597},
  doi =		{10.4230/TGDK.2.1.2},
  annote =	{Keywords: Stream Reasoning, Stream Processing, RDF streams, Streaming Linked Data, Continuous query processing, Temporal Logics, High-performance computing, Databases}
}

Document

Survey

DOI: 10.4230/TGDK.1.1.11

How Does Knowledge Evolve in Open Knowledge Graphs?

Authors: Axel Polleres, Romana Pernisch, Angela Bonifati, Daniele Dell'Aglio, Daniil Dobriy, Stefania Dumbrava, Lorena Etcheverry, Nicolas Ferranti, Katja Hose, Ernesto Jiménez-Ruiz, Matteo Lissandrini, Ansgar Scherp, Riccardo Tommasini, and Johannes Wachs

Published in: TGDK, Volume 1, Issue 1 (2023): Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge, Volume 1, Issue 1

Abstract

Openly available, collaboratively edited Knowledge Graphs (KGs) are key platforms for the collective management of evolving knowledge. The present work aims t o provide an analysis of the obstacles related to investigating and processing specifically this central aspect of evolution in KGs. To this end, we discuss (i) the dimensions of evolution in KGs, (ii) the observability of evolution in existing, open, collaboratively constructed Knowledge Graphs over time, and (iii) possible metrics to analyse this evolution. We provide an overview of relevant state-of-the-art research, ranging from metrics developed for Knowledge Graphs specifically to potential methods from related fields such as network science. Additionally, we discuss technical approaches - and their current limitations - related to storing, analysing and processing large and evolving KGs in terms of handling typical KG downstream tasks.

Cite as

Axel Polleres, Romana Pernisch, Angela Bonifati, Daniele Dell'Aglio, Daniil Dobriy, Stefania Dumbrava, Lorena Etcheverry, Nicolas Ferranti, Katja Hose, Ernesto Jiménez-Ruiz, Matteo Lissandrini, Ansgar Scherp, Riccardo Tommasini, and Johannes Wachs. How Does Knowledge Evolve in Open Knowledge Graphs?. In Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK), Volume 1, Issue 1, pp. 11:1-11:59, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Article{polleres_et_al:TGDK.1.1.11,
  author =	{Polleres, Axel and Pernisch, Romana and Bonifati, Angela and Dell'Aglio, Daniele and Dobriy, Daniil and Dumbrava, Stefania and Etcheverry, Lorena and Ferranti, Nicolas and Hose, Katja and Jim\'{e}nez-Ruiz, Ernesto and Lissandrini, Matteo and Scherp, Ansgar and Tommasini, Riccardo and Wachs, Johannes},
  title =	{{How Does Knowledge Evolve in Open Knowledge Graphs?}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{11:1--11:59},
  year =	{2023},
  volume =	{1},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.11},
  URN =		{urn:nbn:de:0030-drops-194855},
  doi =		{10.4230/TGDK.1.1.11},
  annote =	{Keywords: KG evolution, temporal KG, versioned KG, dynamic KG}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.2

A Framework for Fostering Easier Access to Enriched Textual Information

Authors: Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

Considering the amount of information in unstructured data it is necessary to have suitable methods to extract information from it. Most of these methods have their own output making it difficult and costly to merge and share this information as there currently is no unified way of representing this information. While most of these methods rely on JSON or XML there has been a push to serialize these into RDF compliant formats due to their flexiblity and the existing ecosystem surrounding them. In this paper we introduce a framework whose goal is to provide a serialization of enriched data into an RDF format, following FAIR principles, making it more interpretable, interoperable and shareable. We process a subset of the WikiNER dataset and showcase two examples of using this framework: One using CoNLL annotations and the other by performing entity-linking on an already existing graph. The results are a graph with every connection starting from the document and finishing on tokens while keeping the original text intact while embedding the enriched data into it, in this case the CoNLL annotations and Entities.

Cite as

Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim. A Framework for Fostering Easier Access to Enriched Textual Information. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{silva_et_al:OASIcs.SLATE.2023.2,
  author =	{Silva, Gabriel and Rodrigues, M\'{a}rio and Teixeira, Ant\'{o}nio and Amorim, Marlene},
  title =	{{A Framework for Fostering Easier Access to Enriched Textual Information}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.2},
  URN =		{urn:nbn:de:0030-drops-185165},
  doi =		{10.4230/OASIcs.SLATE.2023.2},
  annote =	{Keywords: Knowledge graphs, Enriched data, Natural language processing, Triplestore}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.5

Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations

Authors: David Rodrigues, António L. Lopes, and Fernando Batista

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

The number of citations a research paper receives is a crucial metric for both researchers and institutions. However, since citation databases have their own source lists, finding all the citations of a given paper can be a challenge. As a result, there may be missing citations that are not counted towards a paper’s total citation count. To address this issue, we present an automated approach to find missing citations leveraging the use of multiple indexing databases. In this research, Web of Science (WoS) serves as a case study and OpenAlex is used as a reference point for comparison. For a given paper, we identify all citing papers found in both research databases. Then, for each citing paper we check if it is indexed in WoS, but not referred in WoS as a citing paper, in order to determine if it is a missing citation. In our experiments, from a set of 1539 papers indexed by WoS, we found 696 missing citations. This outcome proves the success of our approach, and reveals that WoS does not always consider the full list of citing papers of a given publication, even when these citing papers are indexed by WoS. We also found that WoS has a higher chance of missing information for more recent publications. These findings provide relevant insights about this indexing research database, and provide enough motivation for considering other research databases in our study, such as Scopus and Google Scholar, in order to improve the matching and querying algorithms, and to reduce false positives, towards providing a more comprehensive and accurate view of the citations of a paper.

Cite as

David Rodrigues, António L. Lopes, and Fernando Batista. Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 5:1-5:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2023.5,
  author =	{Rodrigues, David and Lopes, Ant\'{o}nio L. and Batista, Fernando},
  title =	{{Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{5:1--5:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.5},
  URN =		{urn:nbn:de:0030-drops-185199},
  doi =		{10.4230/OASIcs.SLATE.2023.5},
  annote =	{Keywords: Research Databases, Citations, Citation Databases, Web of Science, OpenAlex}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.8

OCRticle - a Structure-Aware OCR Application

Authors: Sofia G. Rodrigues dos Santos and J. João Dias de Almeida

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

While there are currently many applications and websites capable of performing Optical Character Recognition (OCR), none of the widely available options offer structured OCR, i.e., OCR that maintains the text’s original structure. For example, if a document has a title, after performing OCR on it, the title should have a different formatting, in order to distinguish it from the rest of the text. This paper covers the topic of structure-aware OCR, first by describing the current state of OCR tools, then by showcasing a prototype tool capable of retaining the structure of articles scanned from an image.

Cite as

Sofia G. Rodrigues dos Santos and J. João Dias de Almeida. OCRticle - a Structure-Aware OCR Application. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{rodriguesdossantos_et_al:OASIcs.SLATE.2023.8,
  author =	{Rodrigues dos Santos, Sofia G. and Dias de Almeida, J. Jo\~{a}o},
  title =	{{OCRticle - a Structure-Aware OCR Application}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{8:1--8:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.8},
  URN =		{urn:nbn:de:0030-drops-185220},
  doi =		{10.4230/OASIcs.SLATE.2023.8},
  annote =	{Keywords: OCR, Optical Character Recognition, Data Structure, Data Parsing, Document Structure}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.11

Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers

Authors: Emanuel Matos, Mário Rodrigues, Pedro Miguel, and António Teixeira

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

Named Entity Recognition (NER) is an essential step for many natural language processing tasks, including Information Extraction. Despite recent advances, particularly using deep learning techniques, the creation of accurate named entity recognizers continues a complex task, highly dependent on annotated data availability. To foster existence of NER systems for new domains it is crucial to obtain the required large volumes of annotated data with low or no manual labor. In this paper it is proposed a system to create the annotated data automatically, by resorting to a set of existing NERs and information sources (DBpedia). The approach was tested with documents of the Tourism domain. Distinct methods were applied for deciding the final named entities and respective tags. The results show that this approach can increase the confidence on annotations and/or augment the number of categories possible to annotate. This paper also presents examples of new NERs that can be rapidly created with the obtained annotated data. The annotated data, combined with the possibility to apply both the ensemble of NER systems and the new Gazetteer-based NERs to large corpora, create the necessary conditions to explore the recent neural deep learning state-of-art approaches to NER (ex: BERT) in domains with scarce or nonexistent data for training.

Cite as

Emanuel Matos, Mário Rodrigues, Pedro Miguel, and António Teixeira. Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{matos_et_al:OASIcs.SLATE.2021.11,
  author =	{Matos, Emanuel and Rodrigues, M\'{a}rio and Miguel, Pedro and Teixeira, Ant\'{o}nio},
  title =	{{Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.11},
  URN =		{urn:nbn:de:0030-drops-144286},
  doi =		{10.4230/OASIcs.SLATE.2021.11},
  annote =	{Keywords: Named Entity Recognition (NER), Automatic Annotation, Gazetteers, Tourism, Portuguese}
}

@InProceedings{matos_et_al:OASIcs.SLATE.2021.11,
  author =	{Matos, Emanuel and Rodrigues, M\'{a}rio and Miguel, Pedro and Teixeira, Ant\'{o}nio},
  title =	{{Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.11},
  URN =		{urn:nbn:de:0030-drops-144286},
  doi =		{10.4230/OASIcs.SLATE.2021.11},
  annote =	{Keywords: Named Entity Recognition (NER), Automatic Annotation, Gazetteers, Tourism, Portuguese}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.13

Knowledge Representation of Crime-Related Events: a Preliminary Approach

Authors: Gonçalo Carnaz, Vitor Beires Nogueira, and Mário Antunes

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

The crime is spread in every daily newspaper, and particularly on criminal investigation reports produced by several Police departments, creating an amount of data to be processed by Humans. Other research studies related to relation extraction (a branch of information retrieval) in Portuguese arisen along the years, but with few extracted relations and several computer methods approaches, that could be improved by recent features, to achieve better performance results. This paper aims to present the ongoing work related to SEM (Simple Event Model) ontology population with instances retrieved from crime-related documents, supported by an SVO (Subject, Verb, Object) algorithm using hand-crafted rules to extract events, achieving a performance measure of 0.86 (F-Measure).

Cite as

Gonçalo Carnaz, Vitor Beires Nogueira, and Mário Antunes. Knowledge Representation of Crime-Related Events: a Preliminary Approach. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 13:1-13:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{carnaz_et_al:OASIcs.SLATE.2019.13,
  author =	{Carnaz, Gon\c{c}alo and Nogueira, Vitor Beires and Antunes, M\'{a}rio},
  title =	{{Knowledge Representation of Crime-Related Events: a Preliminary Approach}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{13:1--13:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.13},
  URN =		{urn:nbn:de:0030-drops-108809},
  doi =		{10.4230/OASIcs.SLATE.2019.13},
  annote =	{Keywords: SEM Ontology, Relation Extraction, Crime-Related Events, SVO Algorithm, Ontology Population}
}

@InProceedings{carnaz_et_al:OASIcs.SLATE.2019.13,
  author =	{Carnaz, Gon\c{c}alo and Nogueira, Vitor Beires and Antunes, M\'{a}rio},
  title =	{{Knowledge Representation of Crime-Related Events: a Preliminary Approach}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{13:1--13:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.13},
  URN =		{urn:nbn:de:0030-drops-108809},
  doi =		{10.4230/OASIcs.SLATE.2019.13},
  annote =	{Keywords: SEM Ontology, Relation Extraction, Crime-Related Events, SVO Algorithm, Ontology Population}
}

Document

DOI: 10.4230/OASIcs.SLATE.2017.20

Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text

Authors: José Casimiro Pereira, António J. S. Teixeira, Mário Rodrigues, Pedro Miguel, and Joaquim Sousa Pinto

Published in: OASIcs, Volume 56, 6th Symposium on Languages, Applications and Technologies (SLATE 2017)

Abstract

Information Extraction from natural texts has a great potential in areas such as Tourism and can be of great assistance in transforming customers' comments in valuable information for Tourism operators, governments and customers. After extraction, information needs to be efficiently transmitted to end-users in a natural way. Systems should not, in general, send extracted information directly to end-users, such as hotel managers, as it can be difficult to read. Naturally, humans transmit and encode information using natural languages, such as Portuguese. The problem arising from the need of efficient and natural transmission of the information to end-user is how to encode it. The use of natural language generation (NLG) is a possible solution, for producing sentences, and, with them, texts. In this paper we address this, with a data-to-text system, a derivation of formal NLG systems that use data as input. The proposed system uses an aligned corpus, which was defined, collected and processed, in about approximately 3 weeks of work. To build the language model were used three different in-domain and out-of-domain corpora. The effects of this approach were evaluated, and results are presented. Automatic metrics, BLEU and Meteor, were used to evaluate the different systems, comparing their values with similar systems. Results show that expanding the corpus has a major positive effect in BLEU and Meteor scores and use of additional corpora (in-domain and out-of-domain) in training language model does not result in significantly different performance. The scores obtained, combined with their comparison with other systems performance and informal evaluation by humans of the sentences produced, give additional support for the capabilities of the translation based approach for fast development of data-to-text for new domains.

Cite as

José Casimiro Pereira, António J. S. Teixeira, Mário Rodrigues, Pedro Miguel, and Joaquim Sousa Pinto. Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 20:1-20:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{pereira_et_al:OASIcs.SLATE.2017.20,
  author =	{Pereira, Jos\'{e} Casimiro and Teixeira, Ant\'{o}nio J. S. and Rodrigues, M\'{a}rio and Miguel, Pedro and Pinto, Joaquim Sousa},
  title =	{{Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{20:1--20:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.20},
  URN =		{urn:nbn:de:0030-drops-79530},
  doi =		{10.4230/OASIcs.SLATE.2017.20},
  annote =	{Keywords: Data-to-Text, Natural Language Generation, Automatic Translation, opinions, Tourism, Portuguese}
}

@InProceedings{pereira_et_al:OASIcs.SLATE.2017.20,
  author =	{Pereira, Jos\'{e} Casimiro and Teixeira, Ant\'{o}nio J. S. and Rodrigues, M\'{a}rio and Miguel, Pedro and Pinto, Joaquim Sousa},
  title =	{{Natural Transmission of Information Extraction Results to End-Users - A Proof-of-Concept Using Data-to-Text}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{20:1--20:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.20},
  URN =		{urn:nbn:de:0030-drops-79530},
  doi =		{10.4230/OASIcs.SLATE.2017.20},
  annote =	{Keywords: Data-to-Text, Natural Language Generation, Automatic Translation, opinions, Tourism, Portuguese}
}

15 Search Results for "Rodrigues, Mário"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message