DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2022.1

Assessing Similarity Between Two Ontologies: The Use of the Integrity Coefficient

Authors: Aly Ngoné Ngom, Papa Ousseynou Mbaye, and Ibrahima Gaye

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

The aim of this paper is to propose a new coefficient of integrity I_{new} for improving N_{Plus} measure which is an improvement of the T_{Ngom} measure. In N_{Plus} measure, the coefficient of integrity used (I) decreases and tends to 0 fastly when we just add some concepts for extendind set of resemblance of ontologies. To fix this problem, we introduce R, the coefficient of representativeness of concepts added in the ontology for its extension. I_{new} decreases slowly compared to I and depends to the cardinality of the ontology extended and the number of concepts added to it too.

Cite as

Aly Ngoné Ngom, Papa Ousseynou Mbaye, and Ibrahima Gaye. Assessing Similarity Between Two Ontologies: The Use of the Integrity Coefficient. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 1:1-1:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ngom_et_al:OASIcs.SLATE.2022.1,
  author =	{Ngom, Aly Ngon\'{e} and Mbaye, Papa Ousseynou and Gaye, Ibrahima},
  title =	{{Assessing Similarity Between Two Ontologies: The Use of the Integrity Coefficient}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{1:1--1:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.1},
  URN =		{urn:nbn:de:0030-drops-167474},
  doi =		{10.4230/OASIcs.SLATE.2022.1},
  annote =	{Keywords: Semantic similarity measures, Ontologies similarities, Tversky ’s measures, Concepts similarities}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.2

Automatic Classification of Portuguese Proverbs

Authors: Jorge Baptista and Sónia Reis

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

In this paper, natural language processing (NLP) and machine learning methods and tools are applied to the task of topic (thematic or semantic) classification of Portuguese proverbs. This is a difficult task since proverbs are usually very short sentences. Such classification should allow an easier selection of the most relevant proverbs for a given situation, considering their context in discourse or within a text. For that, we used, on the one hand, a collection of +32,000 proverbial expressions organized "thematically" into a large set of previously attributed topics (+2,200) and, on the other hand, the Orange data mining toolkit, along with the NLP and machine learning tools it provides. Since the classification provided in the collection of proverbs is, for the most part, based only on a keyword in the body of the proverbs, 2 experiments were set up, to determine the feasibility of the task with a modicum of effort and the most promising configurations applicable. Different sample sizes, 100 and 50 proverbs randomly selected per topic, corresponding to Scenario 1 and 2, respectively, were contrasted; several preprocessing strategies were explored, and different data representation methods tested against several learning algorithms. Results show that Neural Networks is the best performing model, achieving the best classification accuracy of 70% and 61%, in the two different experimental scenarios, Scenario 1 and 2, respectively. Some of the inaccurate classification cases seem to indicate that the machine learning approach can sometimes do a better job than a human classifier, especially considering the manual attribution of the topics by the collection’s author, the sheer number of topics involved, and the very unbalanced distribution of proverbs per topic. Based on the results achieved, the paper presents some proposals for future work to cope with such difficulties.

Cite as

Jorge Baptista and Sónia Reis. Automatic Classification of Portuguese Proverbs. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 2:1-2:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{baptista_et_al:OASIcs.SLATE.2022.2,
  author =	{Baptista, Jorge and Reis, S\'{o}nia},
  title =	{{Automatic Classification of Portuguese Proverbs}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{2:1--2:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.2},
  URN =		{urn:nbn:de:0030-drops-167480},
  doi =		{10.4230/OASIcs.SLATE.2022.2},
  annote =	{Keywords: Portuguese Proverbs, Automatic Topic Classification, Machine Learning}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.3

Question Answering For Toxicological Information Extraction

Authors: Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Working with large amounts of text data has become hectic and time-consuming. In order to reduce human effort, costs, and make the process more efficient, companies and organizations resort to intelligent algorithms to automate and assist the manual work. This problem is also present in the field of toxicological analysis of chemical substances, where information needs to be searched from multiple documents. That said, we propose an approach that relies on Question Answering for acquiring information from unstructured data, in our case, English PDF documents containing information about physicochemical and toxicological properties of chemical substances. Experimental results confirm that our approach achieves promising results which can be applicable in the business scenario, especially if further revised by humans.

Cite as

Bruno Carlos Luís Ferreira, Hugo Gonçalo Oliveira, Hugo Amaro, Ângela Laranjeiro, and Catarina Silva. Question Answering For Toxicological Information Extraction. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ferreira_et_al:OASIcs.SLATE.2022.3,
  author =	{Ferreira, Bruno Carlos Lu{\'\i}s and Gon\c{c}alo Oliveira, Hugo and Amaro, Hugo and Laranjeiro, \^{A}ngela and Silva, Catarina},
  title =	{{Question Answering For Toxicological Information Extraction}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.3},
  URN =		{urn:nbn:de:0030-drops-167493},
  doi =		{10.4230/OASIcs.SLATE.2022.3},
  annote =	{Keywords: Information Extraction, Question Answering, Transformers, Toxicological Analysis}
}

@InProceedings{ferreira_et_al:OASIcs.SLATE.2022.3,
  author =	{Ferreira, Bruno Carlos Lu{\'\i}s and Gon\c{c}alo Oliveira, Hugo and Amaro, Hugo and Laranjeiro, \^{A}ngela and Silva, Catarina},
  title =	{{Question Answering For Toxicological Information Extraction}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.3},
  URN =		{urn:nbn:de:0030-drops-167493},
  doi =		{10.4230/OASIcs.SLATE.2022.3},
  annote =	{Keywords: Information Extraction, Question Answering, Transformers, Toxicological Analysis}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.4

Generation of Document Type Exercises for Automated Assessment

Authors: José Paulo Leal, Ricardo Queirós, and Marco Primo

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

This paper describes ongoing research to develop a system to automatically generate exercises on document type validation. It aims to support multiple text-based document formalisms, currently including JSON and XML. Validation of JSON documents uses JSON Schema and validation of XML uses both XML Schema and DTD. The exercise generator receives as input a document type and produces two sets of documents: valid and invalid instances. Document types written by students must validate the former and invalidate the latter. Exercises produced by this generator can be automatically accessed in a state-of-the-art assessment system. This paper details the proposed approach and describes the design of the system currently being implemented.

Cite as

José Paulo Leal, Ricardo Queirós, and Marco Primo. Generation of Document Type Exercises for Automated Assessment. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 4:1-4:6, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{leal_et_al:OASIcs.SLATE.2022.4,
  author =	{Leal, Jos\'{e} Paulo and Queir\'{o}s, Ricardo and Primo, Marco},
  title =	{{Generation of Document Type Exercises for Automated Assessment}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{4:1--4:6},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.4},
  URN =		{urn:nbn:de:0030-drops-167506},
  doi =		{10.4230/OASIcs.SLATE.2022.4},
  annote =	{Keywords: exercise generation, automated assessment, document type assessment}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.9

Classification of Public Administration Complaints

Authors: Francisco Caldeira, Luís Nunes, and Ricardo Ribeiro

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Complaint management is a problem faced by many organizations that is both vital to customer image and highly dependent on human resources. This work attempts to tackle a part of the problem, by classifying summaries of complaints using machine learning models in order to better redirect these to the appropriate responders. The main challenges of this task is that training datasets are often small and highly imbalanced. This can can have a big impact on the performance of classification models. The dataset analyzed in this work suffers from both of these problems, being relatively small and having labels in different proportions. In this work, two different techniques are analyzed: combining classes together to increase the number of elements of the new class; and, providing new artificial examples for some classes via translation into other languages. The classification models explored were the following: k-NN, SVM, Naïve Bayes, boosting, and Deep Learning approaches, including transformers. The paper concludes that although, as expected, the classes with little representation are hard to classify, the techniques explored helped to boost the performance, especially in the classes with a low number of elements. SVM and BERT-based models outperformed their peers.

Cite as

Francisco Caldeira, Luís Nunes, and Ricardo Ribeiro. Classification of Public Administration Complaints. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 9:1-9:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{caldeira_et_al:OASIcs.SLATE.2022.9,
  author =	{Caldeira, Francisco and Nunes, Lu{\'\i}s and Ribeiro, Ricardo},
  title =	{{Classification of Public Administration Complaints}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{9:1--9:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.9},
  URN =		{urn:nbn:de:0030-drops-167555},
  doi =		{10.4230/OASIcs.SLATE.2022.9},
  annote =	{Keywords: Text Classification, Natural Language Processing, Deep Learning, BERT}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.17

Reasoning with Portuguese Word Embeddings

Authors: Luís Filipe Cunha, J. João Almeida, and Alberto Simões

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models' parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models' evaluation methods.

Cite as

Luís Filipe Cunha, J. João Almeida, and Alberto Simões. Reasoning with Portuguese Word Embeddings. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{cunha_et_al:OASIcs.SLATE.2022.17,
  author =	{Cunha, Lu{\'\i}s Filipe and Almeida, J. Jo\~{a}o and Sim\~{o}es, Alberto},
  title =	{{Reasoning with Portuguese Word Embeddings}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{17:1--17:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.17},
  URN =		{urn:nbn:de:0030-drops-167636},
  doi =		{10.4230/OASIcs.SLATE.2022.17},
  annote =	{Keywords: Word Embeddings, Word2Vec, Evaluation Methods}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.18

ScraPE - An Automated Tool for Programming Exercises Scraping

Authors: Ricardo Queirós

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Learning programming boils down to the practice of solving exercises. However, although there are good and diversified exercises, these are held in proprietary systems hindering their interoperability. This article presents a simple scraping tool, called ScraPE, which through a navigation, interaction and data extraction script, materialized in a domain-specific language, allows extracting the data necessary from Web pages - typically online judges - to compose programming exercises in a standard language. The tool is validated by extracting exercises from a specific online judge. This tool is part of a larger project where the main objective is to provide programming exercises through a simple GraphQL API.

Cite as

Ricardo Queirós. ScraPE - An Automated Tool for Programming Exercises Scraping. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 18:1-18:7, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{queiros:OASIcs.SLATE.2022.18,
  author =	{Queir\'{o}s, Ricardo},
  title =	{{ScraPE - An Automated Tool for Programming Exercises Scraping}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{18:1--18:7},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.18},
  URN =		{urn:nbn:de:0030-drops-167646},
  doi =		{10.4230/OASIcs.SLATE.2022.18},
  annote =	{Keywords: Web scrapping, crawling, programming exercises, online judges, DOM}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.11

Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers

Authors: Emanuel Matos, Mário Rodrigues, Pedro Miguel, and António Teixeira

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

Named Entity Recognition (NER) is an essential step for many natural language processing tasks, including Information Extraction. Despite recent advances, particularly using deep learning techniques, the creation of accurate named entity recognizers continues a complex task, highly dependent on annotated data availability. To foster existence of NER systems for new domains it is crucial to obtain the required large volumes of annotated data with low or no manual labor. In this paper it is proposed a system to create the annotated data automatically, by resorting to a set of existing NERs and information sources (DBpedia). The approach was tested with documents of the Tourism domain. Distinct methods were applied for deciding the final named entities and respective tags. The results show that this approach can increase the confidence on annotations and/or augment the number of categories possible to annotate. This paper also presents examples of new NERs that can be rapidly created with the obtained annotated data. The annotated data, combined with the possibility to apply both the ensemble of NER systems and the new Gazetteer-based NERs to large corpora, create the necessary conditions to explore the recent neural deep learning state-of-art approaches to NER (ex: BERT) in domains with scarce or nonexistent data for training.

Cite as

Emanuel Matos, Mário Rodrigues, Pedro Miguel, and António Teixeira. Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{matos_et_al:OASIcs.SLATE.2021.11,
  author =	{Matos, Emanuel and Rodrigues, M\'{a}rio and Miguel, Pedro and Teixeira, Ant\'{o}nio},
  title =	{{Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.11},
  URN =		{urn:nbn:de:0030-drops-144286},
  doi =		{10.4230/OASIcs.SLATE.2021.11},
  annote =	{Keywords: Named Entity Recognition (NER), Automatic Annotation, Gazetteers, Tourism, Portuguese}
}

@InProceedings{matos_et_al:OASIcs.SLATE.2021.11,
  author =	{Matos, Emanuel and Rodrigues, M\'{a}rio and Miguel, Pedro and Teixeira, Ant\'{o}nio},
  title =	{{Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.11},
  URN =		{urn:nbn:de:0030-drops-144286},
  doi =		{10.4230/OASIcs.SLATE.2021.11},
  annote =	{Keywords: Named Entity Recognition (NER), Automatic Annotation, Gazetteers, Tourism, Portuguese}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.15

NetLangEd, A Web Editor to Support Online Comment Annotation

Authors: Rui Rodrigues, Cristiana Araújo, and Pedro Rangel Henriques

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

This paper focuses on the scientific areas of Digital Humanities, Social Networks and Inappropriate Social Discourse. The main objective of this research project is the development of an editor that allows researchers in the human and social sciences or psychologists to add their reflections or ideas out coming from reading and analyzing posts and comments of an online corpus . In the present context, the editor is being integrated with the analysis tools available in the NetLang platform. NetLangEd, in addition to allowing the three basic operations of adding, editing and removing annotations, will also offer mechanisms to manage, organize, view and locate annotations, all of which will be performed in an easy, fast and user-friendly way.

Cite as

Rui Rodrigues, Cristiana Araújo, and Pedro Rangel Henriques. NetLangEd, A Web Editor to Support Online Comment Annotation. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 15:1-15:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2021.15,
  author =	{Rodrigues, Rui and Ara\'{u}jo, Cristiana and Henriques, Pedro Rangel},
  title =	{{NetLangEd, A Web Editor to Support Online Comment Annotation}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{15:1--15:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.15},
  URN =		{urn:nbn:de:0030-drops-144325},
  doi =		{10.4230/OASIcs.SLATE.2021.15},
  annote =	{Keywords: Online Annotation tool, Document Markup System, Text Editor, Discourse Analysis}
}

Document

DOI: 10.4230/OASIcs.SLATE.2020.7

Towards the Identification of Fake News in Portuguese

Authors: João Rodrigues, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

All over the world, many initiatives have been taken to fight fake news. Governments (e.g., France, Germany, United Kingdom and Spain), on their own way, started to take action regarding legal accountability for those who manufacture or propagate fake news. Different media outlets have also taken a multitude of initiatives to deal with this phenomenon, such as the increase of discipline, accuracy and transparency of publications made internally. Some structural changes have lately been made in said companies and entities in order to better evaluate news in general. As such, many teams were built entirely to fight fake news - the so-called "fact-checkers". These have been adopting different techniques in order to do so: from the typical use of journalists to find out the true behind a controversial statement, to data-scientists that apply forefront techniques such as text mining and machine learning to support the journalist’s decisions. Many of these entities, which aim to maintain or improve their reputation, started to focus on high standards for quality and reliable information, which led to the creation of official and dedicated departments for fact-checking. In this revision paper, not only will we highlight relevant contributions and efforts across the fake news identification and classification status quo, but we will also contextualize the Portuguese language state of affairs in the current state-of-the-art.

Cite as

João Rodrigues, Ricardo Ribeiro, and Fernando Batista. Towards the Identification of Fake News in Portuguese. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2020.7,
  author =	{Rodrigues, Jo\~{a}o and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Towards the Identification of Fake News in Portuguese}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{7:1--7:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.7},
  URN =		{urn:nbn:de:0030-drops-130207},
  doi =		{10.4230/OASIcs.SLATE.2020.7},
  annote =	{Keywords: Fake News, Portuguese Language, Fact-checking}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2020.16

Assessing Factoid Question-Answer Generation for Portuguese (Short Paper)

Authors: João Ferreira, Ricardo Rodrigues, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

We present work on the automatic generation of question-answer pairs in Portuguese, useful, for instance, for populating the knowledge-base of question-answering systems. This includes: (i) a new corpus of close to 600 factoid sentences, manually created from an existing corpus of questions and answers, used as our benchmark; (ii) two approaches for the automatic generation of question-answer pairs, which can be seen as baselines; (iii) results of those approaches in the corpus.

Cite as

João Ferreira, Ricardo Rodrigues, and Hugo Gonçalo Oliveira. Assessing Factoid Question-Answer Generation for Portuguese (Short Paper). In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 16:1-16:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{ferreira_et_al:OASIcs.SLATE.2020.16,
  author =	{Ferreira, Jo\~{a}o and Rodrigues, Ricardo and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Assessing Factoid Question-Answer Generation for Portuguese}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{16:1--16:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.16},
  URN =		{urn:nbn:de:0030-drops-130298},
  doi =		{10.4230/OASIcs.SLATE.2020.16},
  annote =	{Keywords: Question-Answer Generation, Corpus, NLP, Portuguese}
}

Document

Complete Volume

DOI: 10.4230/OASIcs.SLATE.2019

OASIcs, Volume 74, SLATE'19, Complete Volume

Authors: Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

OASIcs, Volume 74, SLATE'19, Complete Volume

Cite as

8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@Proceedings{rodrigues_et_al:OASIcs.SLATE.2019,
  title =	{{OASIcs, Volume 74, SLATE'19, Complete Volume}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019},
  URN =		{urn:nbn:de:0030-drops-109008},
  doi =		{10.4230/OASIcs.SLATE.2019},
  annote =	{Keywords: Computing methodologies, Natural language processing, Software and its engineering, Compilers; Information systems, World Wide Web}
}

Document

Front Matter

DOI: 10.4230/OASIcs.SLATE.2019.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 0:i-0:xviii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2019.0,
  author =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{0:i--0:xviii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.0},
  URN =		{urn:nbn:de:0030-drops-108679},
  doi =		{10.4230/OASIcs.SLATE.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2019.0,
  author =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{0:i--0:xviii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.0},
  URN =		{urn:nbn:de:0030-drops-108679},
  doi =		{10.4230/OASIcs.SLATE.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.1

Graph-of-Entity: A Model for Combined Data Representation and Retrieval

Authors: José Devezas, Carla Lopes, and Sérgio Nunes

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Managing large volumes of digital documents along with the information they contain, or are associated with, can be challenging. As systems become more intelligent, it increasingly makes sense to power retrieval through all available data, where every lead makes it easier to reach relevant documents or entities. Modern search is heavily powered by structured knowledge, but users still query using keywords or, at the very best, telegraphic natural language. As search becomes increasingly dependent on the integration of text and knowledge, novel approaches for a unified representation of combined data present the opportunity to unlock new ranking strategies. We tackle entity-oriented search using graph-based approaches for representation and retrieval. In particular, we propose the graph-of-entity, a novel approach for indexing combined data, where terms, entities and their relations are jointly represented. We compare the graph-of-entity with the graph-of-word, a text-only model, verifying that, overall, it does not yet achieve a better performance, despite obtaining a higher precision. Our assessment was based on a small subset of the INEX 2009 Wikipedia Collection, created from a sample of 10 topics and respectively judged documents. The offline evaluation we do here is complementary to its counterpart from TREC 2017 OpenSearch track, where, during our participation, we had assessed graph-of-entity in an online setting, through team-draft interleaving.

Cite as

José Devezas, Carla Lopes, and Sérgio Nunes. Graph-of-Entity: A Model for Combined Data Representation and Retrieval. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 1:1-1:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{devezas_et_al:OASIcs.SLATE.2019.1,
  author =	{Devezas, Jos\'{e} and Lopes, Carla and Nunes, S\'{e}rgio},
  title =	{{Graph-of-Entity: A Model for Combined Data Representation and Retrieval}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{1:1--1:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.1},
  URN =		{urn:nbn:de:0030-drops-108686},
  doi =		{10.4230/OASIcs.SLATE.2019.1},
  annote =	{Keywords: Entity-oriented search, graph-based models, collection-based graph}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.2

Using Lucene for Developing a Question-Answering Agent in Portuguese

Authors: Hugo Gonçalo Oliveira, Ricardo Filipe, Ricardo Rodrigues, and Ana Alves

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Given the limitations of available platforms for creating conversational agents, and that a question-answering agent suffices in many scenarios, we take advantage of the Information Retrieval library Lucene for developing such an agent for Portuguese. The solution described answers natural language questions based on an indexed list of FAQs. Its adaptation to different domains is a matter of changing the underlying list. Different configurations of this solution, mostly on the language analysis level, resulted in different search strategies, which were tested for answering questions about the economic activity in Portugal. In addition to comparing the different search strategies, we concluded that, towards better answers, it is fruitful to combine the results of different strategies with a voting method.

Cite as

Hugo Gonçalo Oliveira, Ricardo Filipe, Ricardo Rodrigues, and Ana Alves. Using Lucene for Developing a Question-Answering Agent in Portuguese. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2019.2,
  author =	{Gon\c{c}alo Oliveira, Hugo and Filipe, Ricardo and Rodrigues, Ricardo and Alves, Ana},
  title =	{{Using Lucene for Developing a Question-Answering Agent in Portuguese}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.2},
  URN =		{urn:nbn:de:0030-drops-108692},
  doi =		{10.4230/OASIcs.SLATE.2019.2},
  annote =	{Keywords: information retrieval, question answering, natural language interface, natural language processing, natural language understanding}
}

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2019.2,
  author =	{Gon\c{c}alo Oliveira, Hugo and Filipe, Ricardo and Rodrigues, Ricardo and Alves, Ana},
  title =	{{Using Lucene for Developing a Question-Answering Agent in Portuguese}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.2},
  URN =		{urn:nbn:de:0030-drops-108692},
  doi =		{10.4230/OASIcs.SLATE.2019.2},
  annote =	{Keywords: information retrieval, question answering, natural language interface, natural language processing, natural language understanding}
}

45 Search Results for "Rodrigues, Lu�s"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message