10 Search Results for "Pereira, Fernando"


Document
Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments

Authors: Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)


Abstract
Online Hate Speech (OHS) has been growing dramatically on social media, which has motivated researchers to develop a diversity of methods for its automated detection. However, the detection of OHS in Portuguese is still little studied. To fill this gap, we explored different models that proved to be successful in the literature to address this task. In particular, we have explored transfer learning approaches, based on existing BERT-like pre-trained models. The performed experiments were based on CO-HATE, a corpus of YouTube comments posted by the Portuguese online community that was manually labeled by different annotators. Among other categories, those comments were labeled regarding the presence of hate speech and the type of hate speech, specifically overt and covert hate speech. We have assessed the impact of using annotations from different annotators on the performance of such models. In addition, we have analyzed the impact of distinguishing overt and and covert hate speech. The results achieved show the importance of considering the annotator’s profile in the development of hate speech detection models. Regarding the hate speech type, the results obtained do not allow to make any conclusion on what type is easier to detect. Finally, we show that pre-processing does not seem to have a significant impact on the performance of this specific task.

Cite as

Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho, Ricardo Ribeiro, and Fernando Batista. Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 10:1-10:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{matos_et_al:OASIcs.SLATE.2022.10,
  author =	{Matos, Bernardo Cunha and Santos, Raquel Bento and Carvalho, Paula and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{10:1--10:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.10},
  URN =		{urn:nbn:de:0030-drops-167560},
  doi =		{10.4230/OASIcs.SLATE.2022.10},
  annote =	{Keywords: Hate Speech, Text Classification, Transfer Learning, Supervised Learning, Deep Learning}
}
Document
Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains

Authors: Raquel Bento Santos, Bernardo Cunha Matos, Paula Carvalho, Fernando Batista, and Ricardo Ribeiro

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)


Abstract
With the increasing spread of hate speech (HS) on social media, it becomes urgent to develop models that can help detecting it automatically. Typically, such models require large-scale annotated corpora, which are still scarce in languages such as Portuguese. However, creating manually annotated corpora is a very expensive and time-consuming task. To address this problem, we propose an ensemble of two semi-supervised models that can be used to automatically create a corpus representative of online hate speech in Portuguese. The first model combines Generative Adversarial Networks and a BERT-based model. The second model is based on label propagation, and consists of propagating labels from existing annotated corpora to the unlabeled data, by exploring the notion of similarity. We have explored the annotations of three existing corpora (CO-HATE, ToLR-BR, and HPHS) in order to automatically annotate FIGHT, a corpus composed of geolocated tweets produced in the Portuguese territory. Through the process of selecting the best model and the corresponding setup, we have tested different pre-trained embeddings, performed experiments using different training subsets, labeled by different annotators with different perspectives, and performed several experiments with active learning. Furthermore, this work explores back translation as a mean to automatically generate additional hate speech samples. The best results were achieved by combining all the labeled datasets, obtaining 0.664 F1-score for the Hate Speech class in FIGHT.

Cite as

Raquel Bento Santos, Bernardo Cunha Matos, Paula Carvalho, Fernando Batista, and Ricardo Ribeiro. Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{santos_et_al:OASIcs.SLATE.2022.11,
  author =	{Santos, Raquel Bento and Matos, Bernardo Cunha and Carvalho, Paula and Batista, Fernando and Ribeiro, Ricardo},
  title =	{{Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.11},
  URN =		{urn:nbn:de:0030-drops-167570},
  doi =		{10.4230/OASIcs.SLATE.2022.11},
  annote =	{Keywords: Hate Speech, Semi-Supervised Learning, Semi-Automatic Annotation}
}
Document
Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review

Authors: Maraim Masoud, Bianca Pereira, John McCrae, and Paul Buitelaar

Published in: OASIcs, Volume 93, 3rd Conference on Language, Data and Knowledge (LDK 2021)


Abstract
Knowledge graphs have been shown to be an important data structure for many applications, including chatbot development, data integration, and semantic search. In the enterprise domain, such graphs need to be constructed based on both structured (e.g. databases) and unstructured (e.g. textual) internal data sources; preferentially using automatic approaches due to the costs associated with manual construction of knowledge graphs. However, despite the growing body of research that leverages both structured and textual data sources in the context of automatic knowledge graph construction, the research community has centered on either one type of source or the other. In this paper, we conduct a preliminary literature review to investigate approaches that can be used for the integration of textual and structured data sources in the process of automatic knowledge graph construction. We highlight the solutions currently available for use within enterprises and point areas that would benefit from further research.

Cite as

Maraim Masoud, Bianca Pereira, John McCrae, and Paul Buitelaar. Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review. In 3rd Conference on Language, Data and Knowledge (LDK 2021). Open Access Series in Informatics (OASIcs), Volume 93, pp. 19:1-19:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{masoud_et_al:OASIcs.LDK.2021.19,
  author =	{Masoud, Maraim and Pereira, Bianca and McCrae, John and Buitelaar, Paul},
  title =	{{Automatic Construction of Knowledge Graphs from Text and Structured Data: A Preliminary Literature Review}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{19:1--19:9},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.LDK.2021.19},
  URN =		{urn:nbn:de:0030-drops-145556},
  doi =		{10.4230/OASIcs.LDK.2021.19},
  annote =	{Keywords: Knowledge Graph Construction, Enterprise Knowledge Graph}
}
Document
Semantic Search of Mobile Applications Using Word Embeddings

Authors: João Coelho, António Neto, Miguel Tavares, Carlos Coutinho, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)


Abstract
This paper proposes a set of approaches for the semantic search of mobile applications, based on their name and on the unstructured textual information contained in their description. The proposed approaches make use of word-level, character-level, and contextual word-embeddings that have been trained or fine-tuned using a dataset of about 500 thousand mobile apps, collected in the scope of this work. The proposed approaches have been evaluated using a public dataset that includes information about 43 thousand applications, and 56 manually annotated non-exact queries. Our results show that both character-level embeddings trained on our data, and fine-tuned RoBERTa models surpass the performance of the other existing retrieval strategies reported in the literature.

Cite as

João Coelho, António Neto, Miguel Tavares, Carlos Coutinho, Ricardo Ribeiro, and Fernando Batista. Semantic Search of Mobile Applications Using Word Embeddings. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{coelho_et_al:OASIcs.SLATE.2021.12,
  author =	{Coelho, Jo\~{a}o and Neto, Ant\'{o}nio and Tavares, Miguel and Coutinho, Carlos and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Semantic Search of Mobile Applications Using Word Embeddings}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{12:1--12:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.12},
  URN =		{urn:nbn:de:0030-drops-144292},
  doi =		{10.4230/OASIcs.SLATE.2021.12},
  annote =	{Keywords: Semantic Search, Word Embeddings, Elasticsearch, Mobile Applications}
}
Document
Sentiment Analysis of Portuguese Economic News

Authors: Cátia Tavares, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)


Abstract
This paper proposes a rule-based method for automatic polarity detection over economic news texts, which proved suitable for detecting the sentiment in Portuguese economic news. The data used in our experiments consists of 400 manually annotated sentences extracted from economic news, used for evaluation, and about 90 thousand Portuguese economic news, extracted from two well-known Portuguese newspapers, covering the period from 2010 to 2020, that have been used for training our systems. In order to perform sentiment analysis of economic news, we have also tested the adaptation of existing pre-trained modules, and also performed experiments with a set of Machine Learning approaches, and self-training. Experimental results show that our rule-based approach, that uses manually written rules related to the economic context, achieves the best results for automatically detecting the polarity of economic news, largely surpassing the other approaches.

Cite as

Cátia Tavares, Ricardo Ribeiro, and Fernando Batista. Sentiment Analysis of Portuguese Economic News. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 17:1-17:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{tavares_et_al:OASIcs.SLATE.2021.17,
  author =	{Tavares, C\'{a}tia and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Sentiment Analysis of Portuguese Economic News}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{17:1--17:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.17},
  URN =		{urn:nbn:de:0030-drops-144347},
  doi =		{10.4230/OASIcs.SLATE.2021.17},
  annote =	{Keywords: Sentiment Analysis, Economic News, Portuguese Language}
}
Document
Scaling up a Programmers' Profile Tool

Authors: Martinho Aragão, Maria João Varanda Pereira, and Pedro Rangel Henriques

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)


Abstract
The style of programming, the proficiency on the programming language, the conciseness of the solution, the use of comments and so on, allow comparison of programmers through static analysis of their code. The Programmer Profiler Tool, which has been commonly named PP Tool, is an open source profiling tool for Java language where the programmer’s ability can be classified in one out of five possible profiles and the distinction among them falls upon the levels of both skill and readability. Taking a set of correct solutions the comparison between solutions for the same problems is fundamental to evaluate proficiency on the analysed criteria. As such, there was a need to tune the tool in order to handle, simultaneously, with a bigger amount of programs and with a wider scope of solutions. By scaling up PP Tool it will be possible to apply it in a far wider scope of situations as it will be able to cope with programmers from different geographies, with or without formal education, between 1 and 20 years of experience amongst other factors. For that, a set of features were implemented and tested and are described in this paper.

Cite as

Martinho Aragão, Maria João Varanda Pereira, and Pedro Rangel Henriques. Scaling up a Programmers' Profile Tool. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 11:1-11:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{aragao_et_al:OASIcs.SLATE.2019.11,
  author =	{Arag\~{a}o, Martinho and Pereira, Maria Jo\~{a}o Varanda and Henriques, Pedro Rangel},
  title =	{{Scaling up a Programmers' Profile Tool}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{11:1--11:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.11},
  URN =		{urn:nbn:de:0030-drops-108781},
  doi =		{10.4230/OASIcs.SLATE.2019.11},
  annote =	{Keywords: Programmers Profiling, Code Analysis, Programming Skills, Code Readability}
}
Document
Quarmic: A Data-Driven Web Development Framework

Authors: Pedro Miguel Pereira Cunha and José Paulo Leal

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)


Abstract
Quarmic is a web framework for rapid prototyping of web applications. Its main goal is to facilitate the development of web applications by providing a high level of abstraction that hides Web communication complexities. This framework allows developers to build scalable applications capable of handling data communication in different models, data persistence and authentication, requiring them just to use simple annotations. Quarmic’s approach consists of the replication of the shared object among clients and server in order to communicate through its methods execution. Where the annotations, namely decorators, are used to indicate the concern (model or view) that each method addresses and to implement the framework’s inversion of control. By indicating the method concern, it enables the separation of its execution across the clients (responsible for the view) and the server (responsible for the model) which facilitates the state management and code maintenance.

Cite as

Pedro Miguel Pereira Cunha and José Paulo Leal. Quarmic: A Data-Driven Web Development Framework. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 19:1-19:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{cunha_et_al:OASIcs.SLATE.2019.19,
  author =	{Cunha, Pedro Miguel Pereira and Leal, Jos\'{e} Paulo},
  title =	{{Quarmic: A Data-Driven Web Development Framework}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{19:1--19:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.19},
  URN =		{urn:nbn:de:0030-drops-108869},
  doi =		{10.4230/OASIcs.SLATE.2019.19},
  annote =	{Keywords: web development, framework, data-driven}
}
Document
Yedalog: Exploring Knowledge at Scale

Authors: Brian Chin, Daniel von Dincklage, Vuk Ercegovac, Peter Hawkins, Mark S. Miller, Franz Och, Christopher Olston, and Fernando Pereira

Published in: LIPIcs, Volume 32, 1st Summit on Advances in Programming Languages (SNAPL 2015)


Abstract
With huge progress on data processing frameworks, human programmers are frequently the bottleneck when analyzing large repositories of data. We introduce Yedalog, a declarative programming language that allows programmers to mix data-parallel pipelines and computation seamlessly in a single language. By contrast, most existing tools for data-parallel computation embed a sublanguage of data-parallel pipelines in a general-purpose language, or vice versa. Yedalog extends Datalog, incorporating not only computational features from logic programming, but also features for working with data structured as nested records. Yedalog programs can run both on a single machine, and distributed across a cluster in batch and interactive modes, allowing programmers to mix different modes of execution easily.

Cite as

Brian Chin, Daniel von Dincklage, Vuk Ercegovac, Peter Hawkins, Mark S. Miller, Franz Och, Christopher Olston, and Fernando Pereira. Yedalog: Exploring Knowledge at Scale. In 1st Summit on Advances in Programming Languages (SNAPL 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 32, pp. 63-78, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)


Copy BibTex To Clipboard

@InProceedings{chin_et_al:LIPIcs.SNAPL.2015.63,
  author =	{Chin, Brian and von Dincklage, Daniel and Ercegovac, Vuk and Hawkins, Peter and Miller, Mark S. and Och, Franz and Olston, Christopher and Pereira, Fernando},
  title =	{{Yedalog: Exploring Knowledge at Scale}},
  booktitle =	{1st Summit on Advances in Programming Languages (SNAPL 2015)},
  pages =	{63--78},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-80-4},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{32},
  editor =	{Ball, Thomas and Bodík, Rastislav and Krishnamurthi, Shriram and Lerner, Benjamin S. and Morriset, Greg},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SNAPL.2015.63},
  URN =		{urn:nbn:de:0030-drops-50172},
  doi =		{10.4230/LIPIcs.SNAPL.2015.63},
  annote =	{Keywords: Datalog, MapReduce}
}
Document
Detecting a Tweet’s Topic within a Large Number of Portuguese Twitter Trends

Authors: Hugo Rosa, João Paulo Carvalho, and Fernando Batista

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)


Abstract
In this paper we propose to approach the subject of Twitter Topic Detection when in the presence of a large number of trending topics. We use a new technique, called Twitter Topic Fuzzy Fingerprints, and compare it with two popular text classification techniques, Support Vector Machines (SVM) and k-Nearest Neighbours (kNN). Preliminary results show that it outperforms the other two techniques, while still being much faster, which is an essential feature when processing large volumes of streaming data. We focused on a data set of Portuguese language tweets and the respective top trends as indicated by Twitter.

Cite as

Hugo Rosa, João Paulo Carvalho, and Fernando Batista. Detecting a Tweet’s Topic within a Large Number of Portuguese Twitter Trends. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 185-199, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{rosa_et_al:OASIcs.SLATE.2014.185,
  author =	{Rosa, Hugo and Carvalho, Jo\~{a}o Paulo and Batista, Fernando},
  title =	{{Detecting a Tweet’s Topic within a Large Number of Portuguese Twitter Trends}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{185--199},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.185},
  URN =		{urn:nbn:de:0030-drops-45696},
  doi =		{10.4230/OASIcs.SLATE.2014.185},
  annote =	{Keywords: topic detection, social networks data mining, Twitter, Portuguese language}
}
Document
Expanding a Database of Portuguese Tweets

Authors: Gaspar Brogueira, Fernando Batista, João Paulo Carvalho, and Helena Moniz

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)


Abstract
This paper describes an existing database of geolocated tweets that were produced in Portuguese regions and proposes an approach to further expand it. The existing database covers eight consecutive days of collected tweets, totaling about 300 thousand tweets, produced by about 11 thousand different users. A detailed analysis on the content of the messages suggests a predominance of young authors that use Twitter as a way of reaching their colleagues with their feelings, ideas and comments. In order to further characterize this community of young people, we propose a method for retrieving additional tweets produced by the same set of authors already in the database. Our goal is to further extend the knowledge about each user of this community, making it possible to automatically characterize each user by the content he/she produces, cluster users and open other possibilities in the scope of social analysis.

Cite as

Gaspar Brogueira, Fernando Batista, João Paulo Carvalho, and Helena Moniz. Expanding a Database of Portuguese Tweets. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 275-282, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{brogueira_et_al:OASIcs.SLATE.2014.275,
  author =	{Brogueira, Gaspar and Batista, Fernando and Carvalho, Jo\~{a}o Paulo and Moniz, Helena},
  title =	{{Expanding a Database of Portuguese Tweets}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{275--282},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.275},
  URN =		{urn:nbn:de:0030-drops-45763},
  doi =		{10.4230/OASIcs.SLATE.2014.275},
  annote =	{Keywords: Twitter, corpus of Portuguese tweets, Twitter API, natural language processing, text analysis}
}
  • Refine by Author
  • 6 Batista, Fernando
  • 4 Ribeiro, Ricardo
  • 2 Carvalho, João Paulo
  • 2 Carvalho, Paula
  • 2 Matos, Bernardo Cunha
  • Show More...

  • Refine by Classification
  • 2 Computing methodologies → Machine learning
  • 2 Computing methodologies → Transfer learning
  • 2 Information systems → Language models
  • 2 Social and professional topics → Hate speech
  • 2 Software and its engineering → Application specific development environments
  • Show More...

  • Refine by Keyword
  • 2 Hate Speech
  • 2 Twitter
  • 1 Code Analysis
  • 1 Code Readability
  • 1 Datalog
  • Show More...

  • Refine by Type
  • 10 document

  • Refine by Publication Year
  • 3 2021
  • 2 2014
  • 2 2019
  • 2 2022
  • 1 2015

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail