DROPS

Document

Complete Volume

DOI: 10.4230/OASIcs.SLATE.2023

OASIcs, Volume 113, SLATE 2023, Complete Volume

Authors: Alberto Simões, Mario Marcelo Berón, and Filipe Portela

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

OASIcs, Volume 113, SLATE 2023, Complete Volume

Cite as

12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 1-206, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Proceedings{simoes_et_al:OASIcs.SLATE.2023,
  title =	{{OASIcs, Volume 113, SLATE 2023, Complete Volume}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{1--206},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023},
  URN =		{urn:nbn:de:0030-drops-185130},
  doi =		{10.4230/OASIcs.SLATE.2023},
  annote =	{Keywords: OASIcs, Volume 113, SLATE 2023, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/OASIcs.SLATE.2023.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Alberto Simões, Mario Marcelo Berón, and Filipe Portela

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 0:i-0:xii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{simoes_et_al:OASIcs.SLATE.2023.0,
  author =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{0:i--0:xii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.0},
  URN =		{urn:nbn:de:0030-drops-185141},
  doi =		{10.4230/OASIcs.SLATE.2023.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.1

Question Answering over Linked Data with GPT-3

Authors: Bruno Faria, Dylan Perdigão, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

This paper explores GPT-3 for answering natural language questions over Linked Data. Different engines of the model and different approaches are adopted for answering questions in the QALD-9 dataset, namely: zero and few-shot SPARQL generation, as well as fine-tuning in the training portion of the dataset. Answers retrieved by the generated queries and answers generated directly by the model are also compared. Overall results are generally poor, but several insights are provided on using GPT-3 for the proposed task.

Cite as

Bruno Faria, Dylan Perdigão, and Hugo Gonçalo Oliveira. Question Answering over Linked Data with GPT-3. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{faria_et_al:OASIcs.SLATE.2023.1,
  author =	{Faria, Bruno and Perdig\~{a}o, Dylan and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Question Answering over Linked Data with GPT-3}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{1:1--1:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.1},
  URN =		{urn:nbn:de:0030-drops-185155},
  doi =		{10.4230/OASIcs.SLATE.2023.1},
  annote =	{Keywords: SPARQL Generation, Prompt Engineering, Few-Shot Learning, Question Answering, GPT-3}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.2

A Framework for Fostering Easier Access to Enriched Textual Information

Authors: Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

Considering the amount of information in unstructured data it is necessary to have suitable methods to extract information from it. Most of these methods have their own output making it difficult and costly to merge and share this information as there currently is no unified way of representing this information. While most of these methods rely on JSON or XML there has been a push to serialize these into RDF compliant formats due to their flexiblity and the existing ecosystem surrounding them. In this paper we introduce a framework whose goal is to provide a serialization of enriched data into an RDF format, following FAIR principles, making it more interpretable, interoperable and shareable. We process a subset of the WikiNER dataset and showcase two examples of using this framework: One using CoNLL annotations and the other by performing entity-linking on an already existing graph. The results are a graph with every connection starting from the document and finishing on tokens while keeping the original text intact while embedding the enriched data into it, in this case the CoNLL annotations and Entities.

Cite as

Gabriel Silva, Mário Rodrigues, António Teixeira, and Marlene Amorim. A Framework for Fostering Easier Access to Enriched Textual Information. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{silva_et_al:OASIcs.SLATE.2023.2,
  author =	{Silva, Gabriel and Rodrigues, M\'{a}rio and Teixeira, Ant\'{o}nio and Amorim, Marlene},
  title =	{{A Framework for Fostering Easier Access to Enriched Textual Information}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.2},
  URN =		{urn:nbn:de:0030-drops-185165},
  doi =		{10.4230/OASIcs.SLATE.2023.2},
  annote =	{Keywords: Knowledge graphs, Enriched data, Natural language processing, Triplestore}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.3

A Pseudonymization Prototype for Hungarian

Authors: Attila Novák and Borbála Novák

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

In this paper, we present a pseudonymization prototype for Hungarian, an agglutinating language with complex morphology, implemented as a web service. The service provides the following functions: entity identification and extraction; automatic generation and selection of replacement candidates; automatic and consistent replacement and reinflection of entities in the final pseudonymized document. The named entity recognition model applied handles names of persons well, and it has decent performance on other entity types as well. However ID-like entities need to be handled separately to achieve proper performance (not handled in the current prototype version). For automatic replacement candidate generation, a simple entity embedding model is used. We discuss the performance and limitations of the prototype in detail.

Cite as

Attila Novák and Borbála Novák. A Pseudonymization Prototype for Hungarian. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{novak_et_al:OASIcs.SLATE.2023.3,
  author =	{Nov\'{a}k, Attila and Nov\'{a}k, Borb\'{a}la},
  title =	{{A Pseudonymization Prototype for Hungarian}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.3},
  URN =		{urn:nbn:de:0030-drops-185177},
  doi =		{10.4230/OASIcs.SLATE.2023.3},
  annote =	{Keywords: named entity recognition, morphological reinflection, pseudonymization, entity embedding model}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.4

Generating and Ranking Distractors for Multiple-Choice Questions in Portuguese

Authors: Hugo Gonçalo Oliveira, Igor Caetano, Renato Matos, and Hugo Amaro

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

In the process of multiple-choice question generation, different methods are often considered for distractor acquisition, as an attempt to cover as many questions as possible. Some, however, result in many candidate distractors of variable quality, while only three or four are necessary. We implement some distractor generation methods for Portuguese and propose their combination and ranking with language models. Experimentation results confirm that this increases both coverage and suitability of the selected distractors.

Cite as

Hugo Gonçalo Oliveira, Igor Caetano, Renato Matos, and Hugo Amaro. Generating and Ranking Distractors for Multiple-Choice Questions in Portuguese. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 4:1-4:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2023.4,
  author =	{Gon\c{c}alo Oliveira, Hugo and Caetano, Igor and Matos, Renato and Amaro, Hugo},
  title =	{{Generating and Ranking Distractors for Multiple-Choice Questions in Portuguese}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{4:1--4:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.4},
  URN =		{urn:nbn:de:0030-drops-185185},
  doi =		{10.4230/OASIcs.SLATE.2023.4},
  annote =	{Keywords: Multiple-Choice Questions, Distractor Generation, Language Models}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.5

Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations

Authors: David Rodrigues, António L. Lopes, and Fernando Batista

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

The number of citations a research paper receives is a crucial metric for both researchers and institutions. However, since citation databases have their own source lists, finding all the citations of a given paper can be a challenge. As a result, there may be missing citations that are not counted towards a paper’s total citation count. To address this issue, we present an automated approach to find missing citations leveraging the use of multiple indexing databases. In this research, Web of Science (WoS) serves as a case study and OpenAlex is used as a reference point for comparison. For a given paper, we identify all citing papers found in both research databases. Then, for each citing paper we check if it is indexed in WoS, but not referred in WoS as a citing paper, in order to determine if it is a missing citation. In our experiments, from a set of 1539 papers indexed by WoS, we found 696 missing citations. This outcome proves the success of our approach, and reveals that WoS does not always consider the full list of citing papers of a given publication, even when these citing papers are indexed by WoS. We also found that WoS has a higher chance of missing information for more recent publications. These findings provide relevant insights about this indexing research database, and provide enough motivation for considering other research databases in our study, such as Scopus and Google Scholar, in order to improve the matching and querying algorithms, and to reduce false positives, towards providing a more comprehensive and accurate view of the citations of a paper.

Cite as

David Rodrigues, António L. Lopes, and Fernando Batista. Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 5:1-5:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2023.5,
  author =	{Rodrigues, David and Lopes, Ant\'{o}nio L. and Batista, Fernando},
  title =	{{Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{5:1--5:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.5},
  URN =		{urn:nbn:de:0030-drops-185199},
  doi =		{10.4230/OASIcs.SLATE.2023.5},
  annote =	{Keywords: Research Databases, Citations, Citation Databases, Web of Science, OpenAlex}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.6

Querying Relational Databases with Speech-Recognition Driven by Contextual Knowledge

Authors: Dietmar Seipel, Benjamin Förster, Magnus Liebl, Marcel Waleska, and Salvador Abreu

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

We are extending the keyword-based query interface DdQl for relational databases which is based on contextual background knowledge such as suitable join conditions and which was proposed in [{Dietmar Seipel, 2021]. In the previous paper, join conditions were extracted from existing referential integrity (foreign key) constraints of the database schema, or they could be learned from other, previous database queries. In this paper, we describe a speech-to-text component for entering the query keywords based on the system Whisper. Keywords, which have been recognized wrongly by Whisper can be corrected to similarly sounding words. Again, the context of the database schema can help here. For users with a limited knowledge of the schema and the contents of the database, the approach of DdQl can help to provide useful suggestions for query implementations in Sql or Datalog, from which the user can choose one. Our tool DdQl can be run in a docker image; it yields the possible queries in Sql and a special domain specific rule language that extends Datalog. The Datalog variant allows for additional user-defined aggregation functions which are not possible in Sql.

Cite as

Dietmar Seipel, Benjamin Förster, Magnus Liebl, Marcel Waleska, and Salvador Abreu. Querying Relational Databases with Speech-Recognition Driven by Contextual Knowledge. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 6:1-6:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{seipel_et_al:OASIcs.SLATE.2023.6,
  author =	{Seipel, Dietmar and F\"{o}rster, Benjamin and Liebl, Magnus and Waleska, Marcel and Abreu, Salvador},
  title =	{{Querying Relational Databases with Speech-Recognition Driven by Contextual Knowledge}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{6:1--6:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.6},
  URN =		{urn:nbn:de:0030-drops-185202},
  doi =		{10.4230/OASIcs.SLATE.2023.6},
  annote =	{Keywords: Knowledge Bases, Natural Language Interface, Logic Programming, Definite Clause Grammars, Referential Integrity Constraints, Speech-to-Text}
}

@InProceedings{seipel_et_al:OASIcs.SLATE.2023.6,
  author =	{Seipel, Dietmar and F\"{o}rster, Benjamin and Liebl, Magnus and Waleska, Marcel and Abreu, Salvador},
  title =	{{Querying Relational Databases with Speech-Recognition Driven by Contextual Knowledge}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{6:1--6:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.6},
  URN =		{urn:nbn:de:0030-drops-185202},
  doi =		{10.4230/OASIcs.SLATE.2023.6},
  annote =	{Keywords: Knowledge Bases, Natural Language Interface, Logic Programming, Definite Clause Grammars, Referential Integrity Constraints, Speech-to-Text}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2023.7

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications (Short Paper)

Authors: Simone Wills, Yu Bai, Cristian Tejedor-García, Catia Cucchiarini, and Helmer Strik

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

Voicebots have provided a new avenue for supporting the development of language skills, particularly within the context of second language learning. Voicebots, though, have largely been geared towards native adult speakers. We sought to assess the performance of two state-of-the-art ASR systems, Wav2Vec2.0 and Whisper AI, with a view to developing a voicebot that can support children acquiring a foreign language. We evaluated their performance on read and extemporaneous speech of native and non-native Dutch children. We also investigated the utility of using ASR technology to provide insight into the children’s pronunciation and fluency. The results show that recent, pre-trained ASR transformer-based models achieve acceptable performance from which detailed feedback on phoneme pronunciation quality can be extracted, despite the challenging nature of child and non-native speech.

Cite as

Simone Wills, Yu Bai, Cristian Tejedor-García, Catia Cucchiarini, and Helmer Strik. Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications (Short Paper). In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 7:1-7:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{wills_et_al:OASIcs.SLATE.2023.7,
  author =	{Wills, Simone and Bai, Yu and Tejedor-Garc{\'\i}a, Cristian and Cucchiarini, Catia and Strik, Helmer},
  title =	{{Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{7:1--7:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.7},
  URN =		{urn:nbn:de:0030-drops-185218},
  doi =		{10.4230/OASIcs.SLATE.2023.7},
  annote =	{Keywords: Automatic Speech Recognition, ASR, Child Speech, Non-Native Speech, Human-computer Interaction, Whisper, Wav2Vec2.0}
}

@InProceedings{wills_et_al:OASIcs.SLATE.2023.7,
  author =	{Wills, Simone and Bai, Yu and Tejedor-Garc{\'\i}a, Cristian and Cucchiarini, Catia and Strik, Helmer},
  title =	{{Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{7:1--7:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.7},
  URN =		{urn:nbn:de:0030-drops-185218},
  doi =		{10.4230/OASIcs.SLATE.2023.7},
  annote =	{Keywords: Automatic Speech Recognition, ASR, Child Speech, Non-Native Speech, Human-computer Interaction, Whisper, Wav2Vec2.0}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.8

OCRticle - a Structure-Aware OCR Application

Authors: Sofia G. Rodrigues dos Santos and J. João Dias de Almeida

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

While there are currently many applications and websites capable of performing Optical Character Recognition (OCR), none of the widely available options offer structured OCR, i.e., OCR that maintains the text’s original structure. For example, if a document has a title, after performing OCR on it, the title should have a different formatting, in order to distinguish it from the rest of the text. This paper covers the topic of structure-aware OCR, first by describing the current state of OCR tools, then by showcasing a prototype tool capable of retaining the structure of articles scanned from an image.

Cite as

Sofia G. Rodrigues dos Santos and J. João Dias de Almeida. OCRticle - a Structure-Aware OCR Application. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 8:1-8:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{rodriguesdossantos_et_al:OASIcs.SLATE.2023.8,
  author =	{Rodrigues dos Santos, Sofia G. and Dias de Almeida, J. Jo\~{a}o},
  title =	{{OCRticle - a Structure-Aware OCR Application}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{8:1--8:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.8},
  URN =		{urn:nbn:de:0030-drops-185220},
  doi =		{10.4230/OASIcs.SLATE.2023.8},
  annote =	{Keywords: OCR, Optical Character Recognition, Data Structure, Data Parsing, Document Structure}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2023.9

Narrative Extraction from Semantic Graphs (Short Paper)

Authors: Daniil Lystopadskyi, André Santos, and José Paulo Leal

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

This paper proposes an interactive approach for narrative extraction from semantic graphs. The proposed approach extracts events from RDF triples, maps them to their corresponding attributes, and assembles them into a chronological sequence to form narrative graphs. The approach is evaluated on the Wikidata graph and achieves promising results in terms of narrative quality and coherence. The paper also discusses several avenues for future work, including the integration of machine learning, graph embedding methods and the exploration of advanced techniques for attention-based narrative labeling and semantic role labeling. Overall, the proposed method offers a promising approach to narrative extraction from semantic graphs and has the potential to be useful in various applications, including chatbots, conversational agents, and content creation tools.

Cite as

Daniil Lystopadskyi, André Santos, and José Paulo Leal. Narrative Extraction from Semantic Graphs (Short Paper). In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 9:1-9:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{lystopadskyi_et_al:OASIcs.SLATE.2023.9,
  author =	{Lystopadskyi, Daniil and Santos, Andr\'{e} and Leal, Jos\'{e} Paulo},
  title =	{{Narrative Extraction from Semantic Graphs}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{9:1--9:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.9},
  URN =		{urn:nbn:de:0030-drops-185231},
  doi =		{10.4230/OASIcs.SLATE.2023.9},
  annote =	{Keywords: Narratives, Narrative Extraction, Information Retrieval, Knowledge Graphs, Semantic Graphs, Resource Description Framework, Web Ontology}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2023.10

Large Language Models: Compilers for the 4^{th} Generation of Programming Languages? (Short Paper)

Authors: Francisco S. Marcondes, José João Almeida, and Paulo Novais

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

This paper explores the possibility of large language models as a fourth generation programming language compiler. This is based on the idea that large language models are able to translate a natural language specification into a program written in a particular programming language. In other words, just as high-level languages provided an additional language abstraction to assembly code, large language models can provide an additional language abstraction to high-level languages. This interpretation allows large language models to be thought of through the lens of compiler theory, leading to insightful conclusions.

Cite as

Francisco S. Marcondes, José João Almeida, and Paulo Novais. Large Language Models: Compilers for the 4^{th} Generation of Programming Languages? (Short Paper). In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 10:1-10:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{s.marcondes_et_al:OASIcs.SLATE.2023.10,
  author =	{S. Marcondes, Francisco and Almeida, Jos\'{e} Jo\~{a}o and Novais, Paulo},
  title =	{{Large Language Models: Compilers for the 4^\{th\} Generation of Programming Languages?}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{10:1--10:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.10},
  URN =		{urn:nbn:de:0030-drops-185240},
  doi =		{10.4230/OASIcs.SLATE.2023.10},
  annote =	{Keywords: programming language, compiler, large language model}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.11

Hierarchical Data-Flow Graphs

Authors: José Pereira, Vitor Vieira, and Alberto Simões

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

Data-Flows are crucial to detect the dependency of statements and expressions in a programming language program. In the context of Static Application Security Testing (SAST), they are heavily used in different aspects, from detecting tainted data to understanding code dependency. In Checkmarx, these data flows are currently computed on the fly, but their efficiency is not the desired, especially when dealing with large projects. With this in mind, a new caching mechanism is being developed, based on hierarchical graphs. In this document, we discuss the basic idea behind this approach, the challenges found and the decisions put in place for the implementation. We will also share the first insights on speed improvements for a proof of concept implementation.

Cite as

José Pereira, Vitor Vieira, and Alberto Simões. Hierarchical Data-Flow Graphs. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 11:1-11:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{pereira_et_al:OASIcs.SLATE.2023.11,
  author =	{Pereira, Jos\'{e} and Vieira, Vitor and Sim\~{o}es, Alberto},
  title =	{{Hierarchical Data-Flow Graphs}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{11:1--11:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.11},
  URN =		{urn:nbn:de:0030-drops-185252},
  doi =		{10.4230/OASIcs.SLATE.2023.11},
  annote =	{Keywords: Data Flow, Static Application Security Testing, Hierarchical Graphs}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.12

Type Annotation for SAST

Authors: Marco Pereira, Alberto Simões, and Pedro Rangel Henriques

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

Static Application Security Testing (SAST) is a type of software security testing that analyzes the source code of an application to identify security vulnerabilities and coding errors. It helps detect security vulnerabilities in software code before deployment reducing the risk of exploitation by attackers. The work presented in this document describes the work performed to upgrade Checkmarx’s SAST tool allowing the execution of vulnerability detection taking into account expression types. For this to be possible, every expression in the Document Object Model needs to have a specific type assigned accordingly to the kind of operation and to the different operand types. At the current stage, this project is already supporting the expression type annotation for three programming languages: C, C++ and C#. This support has been done through the addition of a new Resolver Rule to the Resolver stage, allowing for the generalization of languages. We also compare the complexity of writing vulnerability detection queries with or without access to type information.

Cite as

Marco Pereira, Alberto Simões, and Pedro Rangel Henriques. Type Annotation for SAST. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 12:1-12:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{pereira_et_al:OASIcs.SLATE.2023.12,
  author =	{Pereira, Marco and Sim\~{o}es, Alberto and Henriques, Pedro Rangel},
  title =	{{Type Annotation for SAST}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{12:1--12:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.12},
  URN =		{urn:nbn:de:0030-drops-185261},
  doi =		{10.4230/OASIcs.SLATE.2023.12},
  annote =	{Keywords: Static Application Security Testing, Type Annotation, C, C++, C#}
}

Document

DOI: 10.4230/OASIcs.SLATE.2023.13

Characterization and Identification of Programming Languages

Authors: Júlio Alves, Alvaro Costa Neto, Maria João Varanda Pereira, and Pedro Rangel Henriques

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

This paper presents and discusses a research work whose main goal is to identify which characteristics influence the recognition and identification, by a programmer, of a programming language, specifically analysing a program source code and its linguistic style. In other words, the study that is described aims at answering the following questions: which grammatical elements - including lexical, syntactic, and semantic details - contribute the most for the characterization of a language? How many structural elements of a language may be modified without losing its identity? The long term objective of such research is to acquire new insights on the factors that can lead language engineers to design new programming languages that reduce the cognitive load of both learners and programmers. To elaborate on that subject, the paper starts with a brief explanation of programming languages fundamentals. Then, a list of the main syntactic characteristics of a set of programming languages, chosen for the study, is presented. Those characteristics outcome from the analysis we carried on at first phase of our project. To go deeper on the investigation we decided to collect and analyze the opinion of other programmers. So, the design of a survey to address that task is discussed. The answers obtained from the application of the questionnaire are analysed to present an overall picture of programming languages characteristics and their relative influence to their identification from the programmers’ perspective.

Cite as

Júlio Alves, Alvaro Costa Neto, Maria João Varanda Pereira, and Pedro Rangel Henriques. Characterization and Identification of Programming Languages. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 13:1-13:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{alves_et_al:OASIcs.SLATE.2023.13,
  author =	{Alves, J\'{u}lio and Costa Neto, Alvaro and Pereira, Maria Jo\~{a}o Varanda and Henriques, Pedro Rangel},
  title =	{{Characterization and Identification of Programming Languages}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{13:1--13:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.13},
  URN =		{urn:nbn:de:0030-drops-185273},
  doi =		{10.4230/OASIcs.SLATE.2023.13},
  annote =	{Keywords: Programming Languages, Programming Language Characterization, Programming Language Design, Programming Language Identification}
}

@InProceedings{alves_et_al:OASIcs.SLATE.2023.13,
  author =	{Alves, J\'{u}lio and Costa Neto, Alvaro and Pereira, Maria Jo\~{a}o Varanda and Henriques, Pedro Rangel},
  title =	{{Characterization and Identification of Programming Languages}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{13:1--13:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.13},
  URN =		{urn:nbn:de:0030-drops-185273},
  doi =		{10.4230/OASIcs.SLATE.2023.13},
  annote =	{Keywords: Programming Languages, Programming Language Characterization, Programming Language Design, Programming Language Identification}
}

227 Search Results for "Sim�es, Alberto"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message