DROPS

Volume

OASIcs, Volume 74

8th Symposium on Languages, Applications and Technologies (SLATE 2019)

SLATE 2019, June 27-28, 2019, Coimbra, Portugal

Editors: Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, and Hugo Gonçalo Oliveira

Document

DOI: 10.4230/OASIcs.SLATE.2023.5

Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations

Authors: David Rodrigues, António L. Lopes, and Fernando Batista

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

The number of citations a research paper receives is a crucial metric for both researchers and institutions. However, since citation databases have their own source lists, finding all the citations of a given paper can be a challenge. As a result, there may be missing citations that are not counted towards a paper’s total citation count. To address this issue, we present an automated approach to find missing citations leveraging the use of multiple indexing databases. In this research, Web of Science (WoS) serves as a case study and OpenAlex is used as a reference point for comparison. For a given paper, we identify all citing papers found in both research databases. Then, for each citing paper we check if it is indexed in WoS, but not referred in WoS as a citing paper, in order to determine if it is a missing citation. In our experiments, from a set of 1539 papers indexed by WoS, we found 696 missing citations. This outcome proves the success of our approach, and reveals that WoS does not always consider the full list of citing papers of a given publication, even when these citing papers are indexed by WoS. We also found that WoS has a higher chance of missing information for more recent publications. These findings provide relevant insights about this indexing research database, and provide enough motivation for considering other research databases in our study, such as Scopus and Google Scholar, in order to improve the matching and querying algorithms, and to reduce false positives, towards providing a more comprehensive and accurate view of the citations of a paper.

Cite as

David Rodrigues, António L. Lopes, and Fernando Batista. Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 5:1-5:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2023.5,
  author =	{Rodrigues, David and Lopes, Ant\'{o}nio L. and Batista, Fernando},
  title =	{{Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{5:1--5:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.5},
  URN =		{urn:nbn:de:0030-drops-185199},
  doi =		{10.4230/OASIcs.SLATE.2023.5},
  annote =	{Keywords: Research Databases, Citations, Citation Databases, Web of Science, OpenAlex}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.10

Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments

Authors: Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Online Hate Speech (OHS) has been growing dramatically on social media, which has motivated researchers to develop a diversity of methods for its automated detection. However, the detection of OHS in Portuguese is still little studied. To fill this gap, we explored different models that proved to be successful in the literature to address this task. In particular, we have explored transfer learning approaches, based on existing BERT-like pre-trained models. The performed experiments were based on CO-HATE, a corpus of YouTube comments posted by the Portuguese online community that was manually labeled by different annotators. Among other categories, those comments were labeled regarding the presence of hate speech and the type of hate speech, specifically overt and covert hate speech. We have assessed the impact of using annotations from different annotators on the performance of such models. In addition, we have analyzed the impact of distinguishing overt and and covert hate speech. The results achieved show the importance of considering the annotator’s profile in the development of hate speech detection models. Regarding the hate speech type, the results obtained do not allow to make any conclusion on what type is easier to detect. Finally, we show that pre-processing does not seem to have a significant impact on the performance of this specific task.

Cite as

Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho, Ricardo Ribeiro, and Fernando Batista. Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 10:1-10:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{matos_et_al:OASIcs.SLATE.2022.10,
  author =	{Matos, Bernardo Cunha and Santos, Raquel Bento and Carvalho, Paula and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{10:1--10:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.10},
  URN =		{urn:nbn:de:0030-drops-167560},
  doi =		{10.4230/OASIcs.SLATE.2022.10},
  annote =	{Keywords: Hate Speech, Text Classification, Transfer Learning, Supervised Learning, Deep Learning}
}

@InProceedings{matos_et_al:OASIcs.SLATE.2022.10,
  author =	{Matos, Bernardo Cunha and Santos, Raquel Bento and Carvalho, Paula and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{10:1--10:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.10},
  URN =		{urn:nbn:de:0030-drops-167560},
  doi =		{10.4230/OASIcs.SLATE.2022.10},
  annote =	{Keywords: Hate Speech, Text Classification, Transfer Learning, Supervised Learning, Deep Learning}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.11

Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains

Authors: Raquel Bento Santos, Bernardo Cunha Matos, Paula Carvalho, Fernando Batista, and Ricardo Ribeiro

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

With the increasing spread of hate speech (HS) on social media, it becomes urgent to develop models that can help detecting it automatically. Typically, such models require large-scale annotated corpora, which are still scarce in languages such as Portuguese. However, creating manually annotated corpora is a very expensive and time-consuming task. To address this problem, we propose an ensemble of two semi-supervised models that can be used to automatically create a corpus representative of online hate speech in Portuguese. The first model combines Generative Adversarial Networks and a BERT-based model. The second model is based on label propagation, and consists of propagating labels from existing annotated corpora to the unlabeled data, by exploring the notion of similarity. We have explored the annotations of three existing corpora (CO-HATE, ToLR-BR, and HPHS) in order to automatically annotate FIGHT, a corpus composed of geolocated tweets produced in the Portuguese territory. Through the process of selecting the best model and the corresponding setup, we have tested different pre-trained embeddings, performed experiments using different training subsets, labeled by different annotators with different perspectives, and performed several experiments with active learning. Furthermore, this work explores back translation as a mean to automatically generate additional hate speech samples. The best results were achieved by combining all the labeled datasets, obtaining 0.664 F1-score for the Hate Speech class in FIGHT.

Cite as

Raquel Bento Santos, Bernardo Cunha Matos, Paula Carvalho, Fernando Batista, and Ricardo Ribeiro. Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{santos_et_al:OASIcs.SLATE.2022.11,
  author =	{Santos, Raquel Bento and Matos, Bernardo Cunha and Carvalho, Paula and Batista, Fernando and Ribeiro, Ricardo},
  title =	{{Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.11},
  URN =		{urn:nbn:de:0030-drops-167570},
  doi =		{10.4230/OASIcs.SLATE.2022.11},
  annote =	{Keywords: Hate Speech, Semi-Supervised Learning, Semi-Automatic Annotation}
}

@InProceedings{santos_et_al:OASIcs.SLATE.2022.11,
  author =	{Santos, Raquel Bento and Matos, Bernardo Cunha and Carvalho, Paula and Batista, Fernando and Ribeiro, Ricardo},
  title =	{{Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.11},
  URN =		{urn:nbn:de:0030-drops-167570},
  doi =		{10.4230/OASIcs.SLATE.2022.11},
  annote =	{Keywords: Hate Speech, Semi-Supervised Learning, Semi-Automatic Annotation}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.12

Semantic Search of Mobile Applications Using Word Embeddings

Authors: João Coelho, António Neto, Miguel Tavares, Carlos Coutinho, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

This paper proposes a set of approaches for the semantic search of mobile applications, based on their name and on the unstructured textual information contained in their description. The proposed approaches make use of word-level, character-level, and contextual word-embeddings that have been trained or fine-tuned using a dataset of about 500 thousand mobile apps, collected in the scope of this work. The proposed approaches have been evaluated using a public dataset that includes information about 43 thousand applications, and 56 manually annotated non-exact queries. Our results show that both character-level embeddings trained on our data, and fine-tuned RoBERTa models surpass the performance of the other existing retrieval strategies reported in the literature.

Cite as

João Coelho, António Neto, Miguel Tavares, Carlos Coutinho, Ricardo Ribeiro, and Fernando Batista. Semantic Search of Mobile Applications Using Word Embeddings. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{coelho_et_al:OASIcs.SLATE.2021.12,
  author =	{Coelho, Jo\~{a}o and Neto, Ant\'{o}nio and Tavares, Miguel and Coutinho, Carlos and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Semantic Search of Mobile Applications Using Word Embeddings}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{12:1--12:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.12},
  URN =		{urn:nbn:de:0030-drops-144292},
  doi =		{10.4230/OASIcs.SLATE.2021.12},
  annote =	{Keywords: Semantic Search, Word Embeddings, Elasticsearch, Mobile Applications}
}

@InProceedings{coelho_et_al:OASIcs.SLATE.2021.12,
  author =	{Coelho, Jo\~{a}o and Neto, Ant\'{o}nio and Tavares, Miguel and Coutinho, Carlos and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Semantic Search of Mobile Applications Using Word Embeddings}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{12:1--12:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.12},
  URN =		{urn:nbn:de:0030-drops-144292},
  doi =		{10.4230/OASIcs.SLATE.2021.12},
  annote =	{Keywords: Semantic Search, Word Embeddings, Elasticsearch, Mobile Applications}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.17

Sentiment Analysis of Portuguese Economic News

Authors: Cátia Tavares, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

This paper proposes a rule-based method for automatic polarity detection over economic news texts, which proved suitable for detecting the sentiment in Portuguese economic news. The data used in our experiments consists of 400 manually annotated sentences extracted from economic news, used for evaluation, and about 90 thousand Portuguese economic news, extracted from two well-known Portuguese newspapers, covering the period from 2010 to 2020, that have been used for training our systems. In order to perform sentiment analysis of economic news, we have also tested the adaptation of existing pre-trained modules, and also performed experiments with a set of Machine Learning approaches, and self-training. Experimental results show that our rule-based approach, that uses manually written rules related to the economic context, achieves the best results for automatically detecting the polarity of economic news, largely surpassing the other approaches.

Cite as

Cátia Tavares, Ricardo Ribeiro, and Fernando Batista. Sentiment Analysis of Portuguese Economic News. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 17:1-17:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{tavares_et_al:OASIcs.SLATE.2021.17,
  author =	{Tavares, C\'{a}tia and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Sentiment Analysis of Portuguese Economic News}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{17:1--17:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.17},
  URN =		{urn:nbn:de:0030-drops-144347},
  doi =		{10.4230/OASIcs.SLATE.2021.17},
  annote =	{Keywords: Sentiment Analysis, Economic News, Portuguese Language}
}

Document

DOI: 10.4230/OASIcs.SLATE.2020.3

Detection of Emerging Words in Portuguese Tweets

Authors: Afonso Pinto, Helena Moniz, and Fernando Batista

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

This paper tackles the problem of detecting emerging words on a language, based on social networks content. It proposes an approach for detecting new words on Twitter, and reports the achieved results for a collection of 8 million Portuguese tweets. This study uses geolocated tweets, collected between January 2018 and June 2019, and written in the Portuguese territory. The first six months of the data were used to define an initial vocabulary on known words, and the following 12 months were used for identifying new words, thus testing our approach. The set of resulting words were manually analyzed, revealing a number of distinct events, and suggesting that Twitter may be a valuable resource for researching neology, and the dynamics of a language.

Cite as

Afonso Pinto, Helena Moniz, and Fernando Batista. Detection of Emerging Words in Portuguese Tweets. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{pinto_et_al:OASIcs.SLATE.2020.3,
  author =	{Pinto, Afonso and Moniz, Helena and Batista, Fernando},
  title =	{{Detection of Emerging Words in Portuguese Tweets}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.3},
  URN =		{urn:nbn:de:0030-drops-130164},
  doi =		{10.4230/OASIcs.SLATE.2020.3},
  annote =	{Keywords: Emerging words, Twitter, Portuguese language}
}

Document

DOI: 10.4230/OASIcs.SLATE.2020.7

Towards the Identification of Fake News in Portuguese

Authors: João Rodrigues, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

All over the world, many initiatives have been taken to fight fake news. Governments (e.g., France, Germany, United Kingdom and Spain), on their own way, started to take action regarding legal accountability for those who manufacture or propagate fake news. Different media outlets have also taken a multitude of initiatives to deal with this phenomenon, such as the increase of discipline, accuracy and transparency of publications made internally. Some structural changes have lately been made in said companies and entities in order to better evaluate news in general. As such, many teams were built entirely to fight fake news - the so-called "fact-checkers". These have been adopting different techniques in order to do so: from the typical use of journalists to find out the true behind a controversial statement, to data-scientists that apply forefront techniques such as text mining and machine learning to support the journalist’s decisions. Many of these entities, which aim to maintain or improve their reputation, started to focus on high standards for quality and reliable information, which led to the creation of official and dedicated departments for fact-checking. In this revision paper, not only will we highlight relevant contributions and efforts across the fake news identification and classification status quo, but we will also contextualize the Portuguese language state of affairs in the current state-of-the-art.

Cite as

João Rodrigues, Ricardo Ribeiro, and Fernando Batista. Towards the Identification of Fake News in Portuguese. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2020.7,
  author =	{Rodrigues, Jo\~{a}o and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Towards the Identification of Fake News in Portuguese}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{7:1--7:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.7},
  URN =		{urn:nbn:de:0030-drops-130207},
  doi =		{10.4230/OASIcs.SLATE.2020.7},
  annote =	{Keywords: Fake News, Portuguese Language, Fact-checking}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2020.12

Different Lexicon-Based Approaches to Emotion Identification in Portuguese Tweets (Short Paper)

Authors: Soraia Filipe, Fernando Batista, and Ricardo Ribeiro

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

This paper presents the existing literature on the identification of emotions and describes various lexica-based approaches and translation strategies to identify emotions in Portuguese tweets. A dataset of tweets was manually annotated to evaluate our classifier and also to assess the difficulty of the task. A lexicon-based approach was used in order to classify the presence or absence of eight different emotions in a tweet. Different strategies have been applied to refine and improve an existing and widely used lexicon, by means of automatic machine translation and aligned word embeddings. We tested six different classification approaches, exploring different ways of directly applying resources available for English by means of different translation strategies. The achieved results suggest that a better performance can be obtained both by improving a lexicon and by directly translating tweets into English and then applying an existing English lexicon.

Cite as

Soraia Filipe, Fernando Batista, and Ricardo Ribeiro. Different Lexicon-Based Approaches to Emotion Identification in Portuguese Tweets (Short Paper). In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 12:1-12:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{filipe_et_al:OASIcs.SLATE.2020.12,
  author =	{Filipe, Soraia and Batista, Fernando and Ribeiro, Ricardo},
  title =	{{Different Lexicon-Based Approaches to Emotion Identification in Portuguese Tweets}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{12:1--12:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.12},
  URN =		{urn:nbn:de:0030-drops-130252},
  doi =		{10.4230/OASIcs.SLATE.2020.12},
  annote =	{Keywords: Emotion detection, tweets, Portuguese Language, Emotion lexicon}
}

Document

Complete Volume

DOI: 10.4230/OASIcs.SLATE.2019

OASIcs, Volume 74, SLATE'19, Complete Volume

Authors: Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

OASIcs, Volume 74, SLATE'19, Complete Volume

Cite as

8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@Proceedings{rodrigues_et_al:OASIcs.SLATE.2019,
  title =	{{OASIcs, Volume 74, SLATE'19, Complete Volume}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019},
  URN =		{urn:nbn:de:0030-drops-109008},
  doi =		{10.4230/OASIcs.SLATE.2019},
  annote =	{Keywords: Computing methodologies, Natural language processing, Software and its engineering, Compilers; Information systems, World Wide Web}
}

Document

Front Matter

DOI: 10.4230/OASIcs.SLATE.2019.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 0:i-0:xviii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2019.0,
  author =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{0:i--0:xviii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.0},
  URN =		{urn:nbn:de:0030-drops-108679},
  doi =		{10.4230/OASIcs.SLATE.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2019.0,
  author =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{0:i--0:xviii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.0},
  URN =		{urn:nbn:de:0030-drops-108679},
  doi =		{10.4230/OASIcs.SLATE.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.1

Graph-of-Entity: A Model for Combined Data Representation and Retrieval

Authors: José Devezas, Carla Lopes, and Sérgio Nunes

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Managing large volumes of digital documents along with the information they contain, or are associated with, can be challenging. As systems become more intelligent, it increasingly makes sense to power retrieval through all available data, where every lead makes it easier to reach relevant documents or entities. Modern search is heavily powered by structured knowledge, but users still query using keywords or, at the very best, telegraphic natural language. As search becomes increasingly dependent on the integration of text and knowledge, novel approaches for a unified representation of combined data present the opportunity to unlock new ranking strategies. We tackle entity-oriented search using graph-based approaches for representation and retrieval. In particular, we propose the graph-of-entity, a novel approach for indexing combined data, where terms, entities and their relations are jointly represented. We compare the graph-of-entity with the graph-of-word, a text-only model, verifying that, overall, it does not yet achieve a better performance, despite obtaining a higher precision. Our assessment was based on a small subset of the INEX 2009 Wikipedia Collection, created from a sample of 10 topics and respectively judged documents. The offline evaluation we do here is complementary to its counterpart from TREC 2017 OpenSearch track, where, during our participation, we had assessed graph-of-entity in an online setting, through team-draft interleaving.

Cite as

José Devezas, Carla Lopes, and Sérgio Nunes. Graph-of-Entity: A Model for Combined Data Representation and Retrieval. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 1:1-1:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{devezas_et_al:OASIcs.SLATE.2019.1,
  author =	{Devezas, Jos\'{e} and Lopes, Carla and Nunes, S\'{e}rgio},
  title =	{{Graph-of-Entity: A Model for Combined Data Representation and Retrieval}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{1:1--1:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.1},
  URN =		{urn:nbn:de:0030-drops-108686},
  doi =		{10.4230/OASIcs.SLATE.2019.1},
  annote =	{Keywords: Entity-oriented search, graph-based models, collection-based graph}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.2

Using Lucene for Developing a Question-Answering Agent in Portuguese

Authors: Hugo Gonçalo Oliveira, Ricardo Filipe, Ricardo Rodrigues, and Ana Alves

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Given the limitations of available platforms for creating conversational agents, and that a question-answering agent suffices in many scenarios, we take advantage of the Information Retrieval library Lucene for developing such an agent for Portuguese. The solution described answers natural language questions based on an indexed list of FAQs. Its adaptation to different domains is a matter of changing the underlying list. Different configurations of this solution, mostly on the language analysis level, resulted in different search strategies, which were tested for answering questions about the economic activity in Portugal. In addition to comparing the different search strategies, we concluded that, towards better answers, it is fruitful to combine the results of different strategies with a voting method.

Cite as

Hugo Gonçalo Oliveira, Ricardo Filipe, Ricardo Rodrigues, and Ana Alves. Using Lucene for Developing a Question-Answering Agent in Portuguese. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 2:1-2:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2019.2,
  author =	{Gon\c{c}alo Oliveira, Hugo and Filipe, Ricardo and Rodrigues, Ricardo and Alves, Ana},
  title =	{{Using Lucene for Developing a Question-Answering Agent in Portuguese}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.2},
  URN =		{urn:nbn:de:0030-drops-108692},
  doi =		{10.4230/OASIcs.SLATE.2019.2},
  annote =	{Keywords: information retrieval, question answering, natural language interface, natural language processing, natural language understanding}
}

@InProceedings{goncalooliveira_et_al:OASIcs.SLATE.2019.2,
  author =	{Gon\c{c}alo Oliveira, Hugo and Filipe, Ricardo and Rodrigues, Ricardo and Alves, Ana},
  title =	{{Using Lucene for Developing a Question-Answering Agent in Portuguese}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{2:1--2:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.2},
  URN =		{urn:nbn:de:0030-drops-108692},
  doi =		{10.4230/OASIcs.SLATE.2019.2},
  annote =	{Keywords: information retrieval, question answering, natural language interface, natural language processing, natural language understanding}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.3

Tracing Naming Semantics in Unit Tests of Popular Github Android Projects

Authors: Matej Madeja and Jaroslav Porubän

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

The tests are so closely linked to the source code that we consider them up-to-date documentation. Developers are aware of recommended naming conventions and other best practices that should be used to write tests. In this paper we focus on how the developers test in practice and what conventions they use. For the analysis 5 very popular Android projects from Github were selected. The results show that 49 % of tests contain full and 76 % of tests contain a partial unit under test (UUT) method name in their name. Further, there was observed that UUT was only rarely tested by multiple test classes and thus in cases when the tester wanted to distinguish the way he or she worked with the tested object. The analysis of this paper shows that the word "test" in the test title is not a reliable metric for identifying the test. Apart from assertions, the developers use statements like verify, try-catch and throw exception to verify the correctness of UUT functionality. At the same time it was found out that the test titles contained keywords which could lead to the identification of UUT, use case of test or data used for test. It was also found out that the words in the test title were very often found in its body and in a smaller amount in UUT body which indicated the use of similar vocabulary in tests and UUT.

Cite as

Matej Madeja and Jaroslav Porubän. Tracing Naming Semantics in Unit Tests of Popular Github Android Projects. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 3:1-3:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{madeja_et_al:OASIcs.SLATE.2019.3,
  author =	{Madeja, Matej and Porub\"{a}n, Jaroslav},
  title =	{{Tracing Naming Semantics in Unit Tests of Popular Github Android Projects}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{3:1--3:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.3},
  URN =		{urn:nbn:de:0030-drops-108705},
  doi =		{10.4230/OASIcs.SLATE.2019.3},
  annote =	{Keywords: unit tests, android, real testing practices, unit tests, program comprehension}
}

Document

DOI: 10.4230/OASIcs.SLATE.2019.4

Robust Phoneme Recognition with Little Data

Authors: Christopher Dane Shulby, Martha Dais Ferreira, Rodrigo F. de Mello, and Sandra Maria Aluisio

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

A common belief in the community is that deep learning requires large datasets to be effective. We show that with careful parameter selection, deep feature extraction can be applied even to small datasets.We also explore exactly how much data is necessary to guarantee learning by convergence analysis and calculating the shattering coefficient for the algorithms used. Another problem is that state-of-the-art results are rarely reproducible because they use proprietary datasets, pretrained networks and/or weight initializations from other larger networks. We present a two-fold novelty for this situation where a carefully designed CNN architecture, together with a knowledge-driven classifier achieves nearly state-of-the-art phoneme recognition results with absolutely no pretraining or external weight initialization. We also beat the best replication study of the state of the art with a 28% FER. More importantly, we are able to achieve transparent, reproducible frame-level accuracy and, additionally, perform a convergence analysis to show the generalization capacity of the model providing statistical evidence that our results are not obtained by chance. Furthermore, we show how algorithms with strong learning guarantees can not only benefit from raw data extraction but contribute with more robust results.

Cite as

Christopher Dane Shulby, Martha Dais Ferreira, Rodrigo F. de Mello, and Sandra Maria Aluisio. Robust Phoneme Recognition with Little Data. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 4:1-4:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{shulby_et_al:OASIcs.SLATE.2019.4,
  author =	{Shulby, Christopher Dane and Ferreira, Martha Dais and de Mello, Rodrigo F. and Aluisio, Sandra Maria},
  title =	{{Robust Phoneme Recognition with Little Data}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{4:1--4:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.4},
  URN =		{urn:nbn:de:0030-drops-108715},
  doi =		{10.4230/OASIcs.SLATE.2019.4},
  annote =	{Keywords: feature extraction, acoustic modeling, phoneme recognition, statistical learning theory}
}

@InProceedings{shulby_et_al:OASIcs.SLATE.2019.4,
  author =	{Shulby, Christopher Dane and Ferreira, Martha Dais and de Mello, Rodrigo F. and Aluisio, Sandra Maria},
  title =	{{Robust Phoneme Recognition with Little Data}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{4:1--4:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.4},
  URN =		{urn:nbn:de:0030-drops-108715},
  doi =		{10.4230/OASIcs.SLATE.2019.4},
  annote =	{Keywords: feature extraction, acoustic modeling, phoneme recognition, statistical learning theory}
}

38 Search Results for "Batista, Fernando"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message