Search Results

Documents authored by Moniz, Helena


Document
Detection of Emerging Words in Portuguese Tweets

Authors: Afonso Pinto, Helena Moniz, and Fernando Batista

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)


Abstract
This paper tackles the problem of detecting emerging words on a language, based on social networks content. It proposes an approach for detecting new words on Twitter, and reports the achieved results for a collection of 8 million Portuguese tweets. This study uses geolocated tweets, collected between January 2018 and June 2019, and written in the Portuguese territory. The first six months of the data were used to define an initial vocabulary on known words, and the following 12 months were used for identifying new words, thus testing our approach. The set of resulting words were manually analyzed, revealing a number of distinct events, and suggesting that Twitter may be a valuable resource for researching neology, and the dynamics of a language.

Cite as

Afonso Pinto, Helena Moniz, and Fernando Batista. Detection of Emerging Words in Portuguese Tweets. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{pinto_et_al:OASIcs.SLATE.2020.3,
  author =	{Pinto, Afonso and Moniz, Helena and Batista, Fernando},
  title =	{{Detection of Emerging Words in Portuguese Tweets}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.3},
  URN =		{urn:nbn:de:0030-drops-130164},
  doi =		{10.4230/OASIcs.SLATE.2020.3},
  annote =	{Keywords: Emerging words, Twitter, Portuguese language}
}
Document
Expanding a Database of Portuguese Tweets

Authors: Gaspar Brogueira, Fernando Batista, João Paulo Carvalho, and Helena Moniz

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)


Abstract
This paper describes an existing database of geolocated tweets that were produced in Portuguese regions and proposes an approach to further expand it. The existing database covers eight consecutive days of collected tweets, totaling about 300 thousand tweets, produced by about 11 thousand different users. A detailed analysis on the content of the messages suggests a predominance of young authors that use Twitter as a way of reaching their colleagues with their feelings, ideas and comments. In order to further characterize this community of young people, we propose a method for retrieving additional tweets produced by the same set of authors already in the database. Our goal is to further extend the knowledge about each user of this community, making it possible to automatically characterize each user by the content he/she produces, cluster users and open other possibilities in the scope of social analysis.

Cite as

Gaspar Brogueira, Fernando Batista, João Paulo Carvalho, and Helena Moniz. Expanding a Database of Portuguese Tweets. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 275-282, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{brogueira_et_al:OASIcs.SLATE.2014.275,
  author =	{Brogueira, Gaspar and Batista, Fernando and Carvalho, Jo\~{a}o Paulo and Moniz, Helena},
  title =	{{Expanding a Database of Portuguese Tweets}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{275--282},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.275},
  URN =		{urn:nbn:de:0030-drops-45763},
  doi =		{10.4230/OASIcs.SLATE.2014.275},
  annote =	{Keywords: Twitter, corpus of Portuguese tweets, Twitter API, natural language processing, text analysis}
}
Document
Comparing Different Methods for Disfluency Structure Detection

Authors: Henrique Medeiros, Fernando Batista, Helena Moniz, Isabel Trancoso, and Luis Nunes

Published in: OASIcs, Volume 29, 2nd Symposium on Languages, Applications and Technologies (2013)


Abstract
This paper presents a number of experiments focusing on assessing the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiments show that CARTs outperform the other methods on the identification of the distinct structural disfluent regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32h of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. This work shows that using fully automatic prosodic features, disfluency structural regions can be reliably identified using CARTs, where the best results achieved correspond to 81.5% precision, 27.6% recall, and 41.2% F-measure. The best results concern the detection of the interregnum, followed by the detection of the interruption point.

Cite as

Henrique Medeiros, Fernando Batista, Helena Moniz, Isabel Trancoso, and Luis Nunes. Comparing Different Methods for Disfluency Structure Detection. In 2nd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 29, pp. 259-269, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{medeiros_et_al:OASIcs.SLATE.2013.259,
  author =	{Medeiros, Henrique and Batista, Fernando and Moniz, Helena and Trancoso, Isabel and Nunes, Luis},
  title =	{{Comparing Different Methods for Disfluency Structure Detection}},
  booktitle =	{2nd Symposium on Languages, Applications and Technologies},
  pages =	{259--269},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-52-1},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{29},
  editor =	{Leal, Jos\'{e} Paulo and Rocha, Ricardo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2013.259},
  URN =		{urn:nbn:de:0030-drops-40420},
  doi =		{10.4230/OASIcs.SLATE.2013.259},
  annote =	{Keywords: Machine learning, speech processing, prosodic features, automatic detection of disfluencies}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail