OASIcs, Volume 104

11th Symposium on Languages, Applications and Technologies (SLATE 2022)



Thumbnail PDF

Event

SLATE 2022, July 14-15, 2022, Universidade da Beira Interior, Covilhã, Portugal

Editors

João Cordeiro
  • HULTIG, INESC TEC, UBI, Covilhã, Portugal
Maria João Pereira
  • CeDRI, IPB, Bragança, Portugal
Nuno F. Rodrigues
  • ALGORITMI, IPCA, Barcelos, Portugal
Sebastião Pais
  • HULTIG, Nova LINCS, GREYC, UBI, Covilhã, Portugal

Publication Details

  • published at: 2022-07-27
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-95977-245-7
  • DBLP: db/conf/slate/slate2022

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
OASIcs, Volume 104, SLATE 2022, Complete Volume

Authors: João Cordeiro, Maria João Pereira, Nuno F. Rodrigues, and Sebastião Pais


Abstract
OASIcs, Volume 104, SLATE 2022, Complete Volume

Cite as

11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 1-242, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@Proceedings{cordeiro_et_al:OASIcs.SLATE.2022,
  title =	{{OASIcs, Volume 104, SLATE 2022, Complete Volume}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{1--242},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022},
  URN =		{urn:nbn:de:0030-drops-167457},
  doi =		{10.4230/OASIcs.SLATE.2022},
  annote =	{Keywords: OASIcs, Volume 104, SLATE 2022, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: João Cordeiro, Maria João Pereira, Nuno F. Rodrigues, and Sebastião Pais


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 0:i-0:xiv, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{cordeiro_et_al:OASIcs.SLATE.2022.0,
  author =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{0:i--0:xiv},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.0},
  URN =		{urn:nbn:de:0030-drops-167464},
  doi =		{10.4230/OASIcs.SLATE.2022.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Assessing Similarity Between Two Ontologies: The Use of the Integrity Coefficient

Authors: Aly Ngoné Ngom, Papa Ousseynou Mbaye, and Ibrahima Gaye


Abstract
The aim of this paper is to propose a new coefficient of integrity I_{new} for improving N_{Plus} measure which is an improvement of the T_{Ngom} measure. In N_{Plus} measure, the coefficient of integrity used (I) decreases and tends to 0 fastly when we just add some concepts for extendind set of resemblance of ontologies. To fix this problem, we introduce R, the coefficient of representativeness of concepts added in the ontology for its extension. I_{new} decreases slowly compared to I and depends to the cardinality of the ontology extended and the number of concepts added to it too.

Cite as

Aly Ngoné Ngom, Papa Ousseynou Mbaye, and Ibrahima Gaye. Assessing Similarity Between Two Ontologies: The Use of the Integrity Coefficient. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 1:1-1:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{ngom_et_al:OASIcs.SLATE.2022.1,
  author =	{Ngom, Aly Ngon\'{e} and Mbaye, Papa Ousseynou and Gaye, Ibrahima},
  title =	{{Assessing Similarity Between Two Ontologies: The Use of the Integrity Coefficient}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{1:1--1:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.1},
  URN =		{urn:nbn:de:0030-drops-167474},
  doi =		{10.4230/OASIcs.SLATE.2022.1},
  annote =	{Keywords: Semantic similarity measures, Ontologies similarities, Tversky ’s measures, Concepts similarities}
}
Document
Automatic Classification of Portuguese Proverbs

Authors: Jorge Baptista and Sónia Reis


Abstract
In this paper, natural language processing (NLP) and machine learning methods and tools are applied to the task of topic (thematic or semantic) classification of Portuguese proverbs. This is a difficult task since proverbs are usually very short sentences. Such classification should allow an easier selection of the most relevant proverbs for a given situation, considering their context in discourse or within a text. For that, we used, on the one hand, a collection of +32,000 proverbial expressions organized "thematically" into a large set of previously attributed topics (+2,200) and, on the other hand, the Orange data mining toolkit, along with the NLP and machine learning tools it provides. Since the classification provided in the collection of proverbs is, for the most part, based only on a keyword in the body of the proverbs, 2 experiments were set up, to determine the feasibility of the task with a modicum of effort and the most promising configurations applicable. Different sample sizes, 100 and 50 proverbs randomly selected per topic, corresponding to Scenario 1 and 2, respectively, were contrasted; several preprocessing strategies were explored, and different data representation methods tested against several learning algorithms. Results show that Neural Networks is the best performing model, achieving the best classification accuracy of 70% and 61%, in the two different experimental scenarios, Scenario 1 and 2, respectively. Some of the inaccurate classification cases seem to indicate that the machine learning approach can sometimes do a better job than a human classifier, especially considering the manual attribution of the topics by the collection’s author, the sheer number of topics involved, and the very unbalanced distribution of proverbs per topic. Based on the results achieved, the paper presents some proposals for future work to cope with such difficulties.

Cite as

Jorge Baptista and Sónia Reis. Automatic Classification of Portuguese Proverbs. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 2:1-2:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2022.2,
  author =	{Baptista, Jorge and Reis, S\'{o}nia},
  title =	{{Automatic Classification of Portuguese Proverbs}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{2:1--2:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.2},
  URN =		{urn:nbn:de:0030-drops-167480},
  doi =		{10.4230/OASIcs.SLATE.2022.2},
  annote =	{Keywords: Portuguese Proverbs, Automatic Topic Classification, Machine Learning}
}
Document
Question Answering For Toxicological Information Extraction

Authors: Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva


Abstract
Working with large amounts of text data has become hectic and time-consuming. In order to reduce human effort, costs, and make the process more efficient, companies and organizations resort to intelligent algorithms to automate and assist the manual work. This problem is also present in the field of toxicological analysis of chemical substances, where information needs to be searched from multiple documents. That said, we propose an approach that relies on Question Answering for acquiring information from unstructured data, in our case, English PDF documents containing information about physicochemical and toxicological properties of chemical substances. Experimental results confirm that our approach achieves promising results which can be applicable in the business scenario, especially if further revised by humans.

Cite as

Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva. Question Answering For Toxicological Information Extraction. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{ferreira_et_al:OASIcs.SLATE.2022.3,
  author =	{Ferreira, Bruno Carlos Lu{\'\i}s and Gon\c{c}alo Oliveira, Hugo and Amaro, Hugo and Laranjeiro, \^{A}ngela and Silva, Catarina},
  title =	{{Question Answering For Toxicological Information Extraction}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.3},
  URN =		{urn:nbn:de:0030-drops-167493},
  doi =		{10.4230/OASIcs.SLATE.2022.3},
  annote =	{Keywords: Information Extraction, Question Answering, Transformers, Toxicological Analysis}
}
Document
Generation of Document Type Exercises for Automated Assessment

Authors: José Paulo Leal, Ricardo Queirós, and Marco Primo


Abstract
This paper describes ongoing research to develop a system to automatically generate exercises on document type validation. It aims to support multiple text-based document formalisms, currently including JSON and XML. Validation of JSON documents uses JSON Schema and validation of XML uses both XML Schema and DTD. The exercise generator receives as input a document type and produces two sets of documents: valid and invalid instances. Document types written by students must validate the former and invalidate the latter. Exercises produced by this generator can be automatically accessed in a state-of-the-art assessment system. This paper details the proposed approach and describes the design of the system currently being implemented.

Cite as

José Paulo Leal, Ricardo Queirós, and Marco Primo. Generation of Document Type Exercises for Automated Assessment. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 4:1-4:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{leal_et_al:OASIcs.SLATE.2022.4,
  author =	{Leal, Jos\'{e} Paulo and Queir\'{o}s, Ricardo and Primo, Marco},
  title =	{{Generation of Document Type Exercises for Automated Assessment}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{4:1--4:6},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.4},
  URN =		{urn:nbn:de:0030-drops-167506},
  doi =		{10.4230/OASIcs.SLATE.2022.4},
  annote =	{Keywords: exercise generation, automated assessment, document type assessment}
}
Document
Synthetic Data Generation from JSON Schemas

Authors: Hugo André Coelho Cardoso and José Carlos Ramalho


Abstract
This document describes the steps taken in the development of DataGen From Schemas. This new version of DataGen is an application that makes it possible to automatically generate representative synthetic datasets from JSON and XML schemas, in order to facilitate tasks such as the thorough testing of software applications and scientific endeavors in relevant areas, namely Data Science. This paper focuses solely on the JSON Schema component of the application. DataGen’s prior version is an online open-source application that allows the quick prototyping of datasets through its own Domain Specific Language (DSL) of specification of data models. DataGen is able to parse these models and generate synthetic datasets according to the structural and semantic restrictions stipulated, automating the whole process of data generation with spontaneous values created in runtime and/or from a library of support datasets. The objective of this new product, DataGen From Schemas, is to expand DataGen’s use cases and raise the datasets specification’s abstraction level, making it possible to generate synthetic datasets directly from schemas. This new platform builds upon its prior version and acts as its complement, operating jointly and sharing the same data layer, in order to assure the compatibility of both platforms and the portability of the created DSL models between them. Its purpose is to parse schema files and generate corresponding DSL models, effectively translating the JSON specification to a DataGen model, then using the original application as a middleware to generate the final datasets.

Cite as

Hugo André Coelho Cardoso and José Carlos Ramalho. Synthetic Data Generation from JSON Schemas. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 5:1-5:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{cardoso_et_al:OASIcs.SLATE.2022.5,
  author =	{Cardoso, Hugo Andr\'{e} Coelho and Ramalho, Jos\'{e} Carlos},
  title =	{{Synthetic Data Generation from JSON Schemas}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{5:1--5:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.5},
  URN =		{urn:nbn:de:0030-drops-167515},
  doi =		{10.4230/OASIcs.SLATE.2022.5},
  annote =	{Keywords: Schemas, JSON, Data Generation, Synthetic Data, DataGen, DSL, Dataset, Grammar, Randomization, Open Source, Data Science, REST API, PEG.js}
}
Document
Extending PyJL - Transpiling Python Libraries to Julia

Authors: Miguel Marcelino and António Menezes Leitão


Abstract
Many high-level programming languages have emerged in recent years. Julia is one of these languages, claiming to offer the speed of C, the macro capabilities of Lisp, and the user-friendliness of Python. Julia’s syntax is one of its major strengths, making it ideal for scientific and numerical computing. Furthermore, Julia’s high-performance on modern hardware makes it an appealing alternative to Python. However, Python has a considerable advantage over Julia: its extensive library set. There have been efforts to make Python libraries available to Julia either through Foreign Function Interfaces (FFI’s), or through manual translation, but both have their tradeoffs: FFI’s do not take advantage of Julia’s performance, as they call Python’s Virtual Machine, and manual translation is demanding and time-consuming. To address these issues and bridge the gap between the two languages, we propose PyJL, a transpilation tool that converts Python source-code to human-readable Julia source-code. Although the development of PyJL is still at an early stage, our preliminary results reveal that the generated code follows the pragmatics of Julia and is capable of high performance.

Cite as

Miguel Marcelino and António Menezes Leitão. Extending PyJL - Transpiling Python Libraries to Julia. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{marcelino_et_al:OASIcs.SLATE.2022.6,
  author =	{Marcelino, Miguel and Leit\~{a}o, Ant\'{o}nio Menezes},
  title =	{{Extending PyJL - Transpiling Python Libraries to Julia}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{6:1--6:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.6},
  URN =		{urn:nbn:de:0030-drops-167525},
  doi =		{10.4230/OASIcs.SLATE.2022.6},
  annote =	{Keywords: Source-to-Source Compiler, Automatic Transpilation, Library Translation, Python, Julia}
}
Document
EWVM, a Web Virtual Machine to Support Code Generation in Compiler Courses

Authors: Sofia Teixeira, José Carlos Ramalho, and Pedro Rangel Henriques


Abstract
This paper describes a project which goal is to analyze and model a complete Virtual stack Machine (VM) environment and build a Web application with a graphical interface to deploy an environment to compile and execute VM programs. The new tool offers two main features: assembles and reports errors in programs written in the assembly language of the Virtual Machine; and animates the execution of the compiled code, displaying the internal state of the VM and providing an interface to control the execution step-by-step. In the paper, after discussing related concepts and works, a proposal to build such a tool, so far called EWVM, will be presented along the architecture drawn. A prototype will be shown, and its impact as an educational tool is argued.

Cite as

Sofia Teixeira, José Carlos Ramalho, and Pedro Rangel Henriques. EWVM, a Web Virtual Machine to Support Code Generation in Compiler Courses. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 7:1-7:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{teixeira_et_al:OASIcs.SLATE.2022.7,
  author =	{Teixeira, Sofia and Ramalho, Jos\'{e} Carlos and Henriques, Pedro Rangel},
  title =	{{EWVM, a Web Virtual Machine to Support Code Generation in Compiler Courses}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{7:1--7:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.7},
  URN =		{urn:nbn:de:0030-drops-167535},
  doi =		{10.4230/OASIcs.SLATE.2022.7},
  annote =	{Keywords: Virtual Machine, Stack Machine, Assembler, Debugger, Compiler, Code Generation}
}
Document
OMT, a Web-Based Tool for Ontology Matching

Authors: João Rodrigues Gomes, Alda Lopes Gançarski, and Pedro Rangel Henriques


Abstract
In recent years ontologies have become an integral part of storing information in a structured and formal manner and a way of sharing said information. With this rise in usage, it was only a matter of time before different people wrote distinct ontologies to represent the same knowledge domain. The area of Ontology Matching was created with the purpose of finding correspondences between different ontologies that represented information in the same domain area. This paper starts with a study of already existing ontology matching methods in order to understand the existing techniques, focusing on the advantages and disadvantages of each one. Then, we propose an approach and an architecture to develop a new web-based tool using the knowledge acquired during the bibliographic research. The paper also includes the presentation of a prototype of the proposed tool, called OMT.

Cite as

João Rodrigues Gomes, Alda Lopes Gançarski, and Pedro Rangel Henriques. OMT, a Web-Based Tool for Ontology Matching. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 8:1-8:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{gomes_et_al:OASIcs.SLATE.2022.8,
  author =	{Gomes, Jo\~{a}o Rodrigues and Gan\c{c}arski, Alda Lopes and Henriques, Pedro Rangel},
  title =	{{OMT, a Web-Based Tool for Ontology Matching}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{8:1--8:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.8},
  URN =		{urn:nbn:de:0030-drops-167547},
  doi =		{10.4230/OASIcs.SLATE.2022.8},
  annote =	{Keywords: Ontology, Ontology Matching, Ontology Alignment}
}
Document
Classification of Public Administration Complaints

Authors: Francisco Caldeira, Luís Nunes, and Ricardo Ribeiro


Abstract
Complaint management is a problem faced by many organizations that is both vital to customer image and highly dependent on human resources. This work attempts to tackle a part of the problem, by classifying summaries of complaints using machine learning models in order to better redirect these to the appropriate responders. The main challenges of this task is that training datasets are often small and highly imbalanced. This can can have a big impact on the performance of classification models. The dataset analyzed in this work suffers from both of these problems, being relatively small and having labels in different proportions. In this work, two different techniques are analyzed: combining classes together to increase the number of elements of the new class; and, providing new artificial examples for some classes via translation into other languages. The classification models explored were the following: k-NN, SVM, Naïve Bayes, boosting, and Deep Learning approaches, including transformers. The paper concludes that although, as expected, the classes with little representation are hard to classify, the techniques explored helped to boost the performance, especially in the classes with a low number of elements. SVM and BERT-based models outperformed their peers.

Cite as

Francisco Caldeira, Luís Nunes, and Ricardo Ribeiro. Classification of Public Administration Complaints. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 9:1-9:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{caldeira_et_al:OASIcs.SLATE.2022.9,
  author =	{Caldeira, Francisco and Nunes, Lu{\'\i}s and Ribeiro, Ricardo},
  title =	{{Classification of Public Administration Complaints}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{9:1--9:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.9},
  URN =		{urn:nbn:de:0030-drops-167555},
  doi =		{10.4230/OASIcs.SLATE.2022.9},
  annote =	{Keywords: Text Classification, Natural Language Processing, Deep Learning, BERT}
}
Document
Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments

Authors: Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho, Ricardo Ribeiro, and Fernando Batista


Abstract
Online Hate Speech (OHS) has been growing dramatically on social media, which has motivated researchers to develop a diversity of methods for its automated detection. However, the detection of OHS in Portuguese is still little studied. To fill this gap, we explored different models that proved to be successful in the literature to address this task. In particular, we have explored transfer learning approaches, based on existing BERT-like pre-trained models. The performed experiments were based on CO-HATE, a corpus of YouTube comments posted by the Portuguese online community that was manually labeled by different annotators. Among other categories, those comments were labeled regarding the presence of hate speech and the type of hate speech, specifically overt and covert hate speech. We have assessed the impact of using annotations from different annotators on the performance of such models. In addition, we have analyzed the impact of distinguishing overt and and covert hate speech. The results achieved show the importance of considering the annotator’s profile in the development of hate speech detection models. Regarding the hate speech type, the results obtained do not allow to make any conclusion on what type is easier to detect. Finally, we show that pre-processing does not seem to have a significant impact on the performance of this specific task.

Cite as

Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho, Ricardo Ribeiro, and Fernando Batista. Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 10:1-10:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{matos_et_al:OASIcs.SLATE.2022.10,
  author =	{Matos, Bernardo Cunha and Santos, Raquel Bento and Carvalho, Paula and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{10:1--10:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.10},
  URN =		{urn:nbn:de:0030-drops-167560},
  doi =		{10.4230/OASIcs.SLATE.2022.10},
  annote =	{Keywords: Hate Speech, Text Classification, Transfer Learning, Supervised Learning, Deep Learning}
}
Document
Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains

Authors: Raquel Bento Santos, Bernardo Cunha Matos, Paula Carvalho, Fernando Batista, and Ricardo Ribeiro


Abstract
With the increasing spread of hate speech (HS) on social media, it becomes urgent to develop models that can help detecting it automatically. Typically, such models require large-scale annotated corpora, which are still scarce in languages such as Portuguese. However, creating manually annotated corpora is a very expensive and time-consuming task. To address this problem, we propose an ensemble of two semi-supervised models that can be used to automatically create a corpus representative of online hate speech in Portuguese. The first model combines Generative Adversarial Networks and a BERT-based model. The second model is based on label propagation, and consists of propagating labels from existing annotated corpora to the unlabeled data, by exploring the notion of similarity. We have explored the annotations of three existing corpora (CO-HATE, ToLR-BR, and HPHS) in order to automatically annotate FIGHT, a corpus composed of geolocated tweets produced in the Portuguese territory. Through the process of selecting the best model and the corresponding setup, we have tested different pre-trained embeddings, performed experiments using different training subsets, labeled by different annotators with different perspectives, and performed several experiments with active learning. Furthermore, this work explores back translation as a mean to automatically generate additional hate speech samples. The best results were achieved by combining all the labeled datasets, obtaining 0.664 F1-score for the Hate Speech class in FIGHT.

Cite as

Raquel Bento Santos, Bernardo Cunha Matos, Paula Carvalho, Fernando Batista, and Ricardo Ribeiro. Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{santos_et_al:OASIcs.SLATE.2022.11,
  author =	{Santos, Raquel Bento and Matos, Bernardo Cunha and Carvalho, Paula and Batista, Fernando and Ribeiro, Ricardo},
  title =	{{Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.11},
  URN =		{urn:nbn:de:0030-drops-167570},
  doi =		{10.4230/OASIcs.SLATE.2022.11},
  annote =	{Keywords: Hate Speech, Semi-Supervised Learning, Semi-Automatic Annotation}
}
Document
Large Semantic Graph Summarization Using Namespaces

Authors: Ana Rita Santos Lopes da Costa, André Santos, and José Paulo Leal


Abstract
We propose an approach to summarize large semantics graphs using namespaces. Semantic graphs based on the Resource Description Framework (RDF) use namespaces on their serializations. Although these namespaces are not part of RDF semantics, they have intrinsic meaning. Based on this insight, we use namespaces to create summary graphs of reduced size, more amenable to be visualized. In the summarization, object literals are also reduced to their data type and the blank nodes to a group of their own. The visualization created for the summary graph aims to give insight of the original large graph. This paper describes the proposed approach and reports on the results obtained with representative large semantic graphs.

Cite as

Ana Rita Santos Lopes da Costa, André Santos, and José Paulo Leal. Large Semantic Graph Summarization Using Namespaces. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 12:1-12:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{dacosta_et_al:OASIcs.SLATE.2022.12,
  author =	{da Costa, Ana Rita Santos Lopes and Santos, Andr\'{e} and Leal, Jos\'{e} Paulo},
  title =	{{Large Semantic Graph Summarization Using Namespaces}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{12:1--12:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.12},
  URN =		{urn:nbn:de:0030-drops-167585},
  doi =		{10.4230/OASIcs.SLATE.2022.12},
  annote =	{Keywords: Semantic graph, RDF, namespaces, reification}
}
Document
Metaobject Protocols for Julia

Authors: Marcelo Santos and António Menezes Leitão


Abstract
Metaobject Protocols enable programmers to extend programming languages without the need to understand the lower level details of their implementation. However, designing these protocols comes with two challenges: allow programmers to limit their concerns to higher level concepts and minimize performance penalties in programs. In this work, we propose metaobject protocol for the programming language Julia. Julia’s object system is very limited, when compared to languages following the Object-Oriented paradigm. However, Julia’s compilation approach allows for a considerable degree of code optimization through the exploration of runtime type information. Through the usage of Julia’s run-time optimizations, we propose a metaobject protocol that combines user-extensibility with limited performance penalties. This paper focuses on the development of a multiple inheritance method dispatch and method combination mechanisms with zero runtime overhead.

Cite as

Marcelo Santos and António Menezes Leitão. Metaobject Protocols for Julia. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 13:1-13:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{santos_et_al:OASIcs.SLATE.2022.13,
  author =	{Santos, Marcelo and Leit\~{a}o, Ant\'{o}nio Menezes},
  title =	{{Metaobject Protocols for Julia}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{13:1--13:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.13},
  URN =		{urn:nbn:de:0030-drops-167597},
  doi =		{10.4230/OASIcs.SLATE.2022.13},
  annote =	{Keywords: Julia, Metaobject Protocols, Object-Oriented Programming, Performance}
}
Document
The Visual Programming Environment ROBI for Educational Robotics

Authors: Gustavo Galvão, Alvaro Costa Neto, Cristiana Araújo, and Pedro Rangel Henriques


Abstract
This paper presents the outcomes of a research project focused on the training of Computational Thinking, resorting to a block-based visual programming language created to program an Arduino Uno based robot. To support the design and implementation of the visual programming environment Robi, we start discussing the relevance of Educational Robotics to motivate and engage children in programming activities. Students usually face great difficulties to learn computer programming and it is nowadays accepted that young people shall be trained in Computational Thinking to acquire the skills necessary to easily solve problems within and beyond the realm of Computer Science and Engineering. The resolution of obstacles imposed by the costs and reduced availability of typical Educational Robotics kits, in combination with the benefits of existing block-based programming languages, like simplicity and intuitiveness, motivated the project here reported and analyzed. We aim at showing that Robi, a visual block-based programming language and robot programming environment, provides an easy, accessible and intuitive platform to learn how to solve problems programming a computer and support the training of Computational Thinking.

Cite as

Gustavo Galvão, Alvaro Costa Neto, Cristiana Araújo, and Pedro Rangel Henriques. The Visual Programming Environment ROBI for Educational Robotics. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 14:1-14:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{galvao_et_al:OASIcs.SLATE.2022.14,
  author =	{Galv\~{a}o, Gustavo and Costa Neto, Alvaro and Ara\'{u}jo, Cristiana and Rangel Henriques, Pedro},
  title =	{{The Visual Programming Environment ROBI for Educational Robotics}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{14:1--14:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.14},
  URN =		{urn:nbn:de:0030-drops-167600},
  doi =		{10.4230/OASIcs.SLATE.2022.14},
  annote =	{Keywords: Programming Languages, Visual Languages, Computer Programming, Educational Robotics}
}
Document
Down-Translating XML: The Python Way

Authors: Alberto Simões and José João Almeida


Abstract
Nowadays, the most used approach to process an XML file is based on the processing of a DOM structure and a set of operations that collects or edits information in the model using some kind of selectors (usually CSS-like or XPath). Nevertheless, the process of performing a depth-first walk through the DOM, and synthesizing values, is a simple way to traverse and transform an entire XML document. In this document we discuss the details on the implementation and usage of a Python package for XML document processing based on this structure. Given the existence of similar tools for other programming languages, we will mainly focus on the used approach, that takes advantage of the Python style guides and development patterns.

Cite as

Alberto Simões and José João Almeida. Down-Translating XML: The Python Way. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 15:1-15:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{simoes_et_al:OASIcs.SLATE.2022.15,
  author =	{Sim\~{o}es, Alberto and Almeida, Jos\'{e} Jo\~{a}o},
  title =	{{Down-Translating XML: The Python Way}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{15:1--15:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.15},
  URN =		{urn:nbn:de:0030-drops-167617},
  doi =		{10.4230/OASIcs.SLATE.2022.15},
  annote =	{Keywords: XML, Python, Depth-First Processing}
}
Document
Determining Programming Languages Complexity and Its Impact on Processing

Authors: Gonçalo Rodrigues Pinto, Pedro Rangel Henriques, Daniela da Cruz, and João Cruz


Abstract
Tools for Programming Languages processing, like Static Analysers (for instance, a Static Application Security Testing (SAST) tool), must be adapted to cope with a different input when the source programming language changes. Complexity of the programming language is one of the key factors that deeply impact the time of giving support to it. This paper aims at proposing an approach for assessing language complexity, measuring, at a first stage, the complexity of its underlying context-free grammar (CFG). From the analysis of concrete case studies, factors have been identified that make the support process more time-consuming, in particular in the stages of language recognition and in the transformation to an abstract syntax tree (AST). In this sense, at a second stage, a set of language characteristics is analysed in order to take into account the referred factors that also impact on the language processing. The principal goal of the project here reported is to help development teams to improve the estimation of time and effort needed to cope with a new programming language. In the paper a tool is proposed, and its prototype is presented, that allows the evaluation of the complexity of a language based on a set of metrics to classify the complexity of its grammar, along with a set of properties. The tool compares the new language complexity so far determined with previously supported languages, to predict the effort to process the new language.

Cite as

Gonçalo Rodrigues Pinto, Pedro Rangel Henriques, Daniela da Cruz, and João Cruz. Determining Programming Languages Complexity and Its Impact on Processing. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 16:1-16:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{pinto_et_al:OASIcs.SLATE.2022.16,
  author =	{Pinto, Gon\c{c}alo Rodrigues and Henriques, Pedro Rangel and da Cruz, Daniela and Cruz, Jo\~{a}o},
  title =	{{Determining Programming Languages Complexity and Its Impact on Processing}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{16:1--16:15},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.16},
  URN =		{urn:nbn:de:0030-drops-167620},
  doi =		{10.4230/OASIcs.SLATE.2022.16},
  annote =	{Keywords: Complexity, Grammar, Language-based-Tool, Programming Language, Static code analysis}
}
Document
Reasoning with Portuguese Word Embeddings

Authors: Luís Filipe Cunha, J. João Almeida, and Alberto Simões


Abstract
Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models' parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models' evaluation methods.

Cite as

Luís Filipe Cunha, J. João Almeida, and Alberto Simões. Reasoning with Portuguese Word Embeddings. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{cunha_et_al:OASIcs.SLATE.2022.17,
  author =	{Cunha, Lu{\'\i}s Filipe and Almeida, J. Jo\~{a}o and Sim\~{o}es, Alberto},
  title =	{{Reasoning with Portuguese Word Embeddings}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{17:1--17:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.17},
  URN =		{urn:nbn:de:0030-drops-167636},
  doi =		{10.4230/OASIcs.SLATE.2022.17},
  annote =	{Keywords: Word Embeddings, Word2Vec, Evaluation Methods}
}
Document
ScraPE - An Automated Tool for Programming Exercises Scraping

Authors: Ricardo Queirós


Abstract
Learning programming boils down to the practice of solving exercises. However, although there are good and diversified exercises, these are held in proprietary systems hindering their interoperability. This article presents a simple scraping tool, called ScraPE, which through a navigation, interaction and data extraction script, materialized in a domain-specific language, allows extracting the data necessary from Web pages - typically online judges - to compose programming exercises in a standard language. The tool is validated by extracting exercises from a specific online judge. This tool is part of a larger project where the main objective is to provide programming exercises through a simple GraphQL API.

Cite as

Ricardo Queirós. ScraPE - An Automated Tool for Programming Exercises Scraping. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 18:1-18:7, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{queiros:OASIcs.SLATE.2022.18,
  author =	{Queir\'{o}s, Ricardo},
  title =	{{ScraPE - An Automated Tool for Programming Exercises Scraping}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{18:1--18:7},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.18},
  URN =		{urn:nbn:de:0030-drops-167646},
  doi =		{10.4230/OASIcs.SLATE.2022.18},
  annote =	{Keywords: Web scrapping, crawling, programming exercises, online judges, DOM}
}
Document
Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs

Authors: Hugo Gonçalo Oliveira, Sara Inácio, and Catarina Silva


Abstract
Following the current interest in developing automatic question answering systems, we analyse alternative approaches for finding suitable answers from a list of Frequently Asked Questions (FAQs), in Portuguese. These rely on different technologies, some more established and others more recent, and are all easily adaptable to new lists of FAQs, on new domains. We analyse the effort required for their configuration, the accuracy of their answers, and the time they take to get such answers. We conclude that traditional Information Retrieval (IR) can be a solution for smaller lists of FAQs, but approaches based on deep neural networks for sentence encoding are at least as reliable and less dependent on the number and complexity of the FAQs. We also contribute with a small dataset of Portuguese FAQs on the domain of telecommunications, which was used in our experiments.

Cite as

Hugo Gonçalo Oliveira, Sara Inácio, and Catarina Silva. Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 19:1-19:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2022.19,
  author =	{Gon\c{c}alo Oliveira, Hugo and In\'{a}cio, Sara and Silva, Catarina},
  title =	{{Analysing Off-The-Shelf Options for Question Answering with Portuguese FAQs}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{19:1--19:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.19},
  URN =		{urn:nbn:de:0030-drops-167652},
  doi =		{10.4230/OASIcs.SLATE.2022.19},
  annote =	{Keywords: Natural Language Processing, Portuguese, Question Answering, FAQs, Information Retrieval, Sentence Encoding, Transformers}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail