Detection of Emerging Words in Portuguese Tweets

Authors Afonso Pinto, Helena Moniz , Fernando Batista

Thumbnail PDF


  • Filesize: 1.33 MB
  • 10 pages

Document Identifiers

Author Details

Afonso Pinto
  • ISCTE - Instituto Universitário de Lisboa, Portugal
Helena Moniz
  • CLUL/FLUL, Universidade de Lisboa, Portugal
  • INESC-ID, Lisboa, Portugal
  • UNBABEL, Lisboa, Portugal
Fernando Batista
  • ISCTE - Instituto Universitário de Lisboa, Portugal
  • INESC-ID, Lisboa, Portugal

Cite AsGet BibTex

Afonso Pinto, Helena Moniz, and Fernando Batista. Detection of Emerging Words in Portuguese Tweets. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


This paper tackles the problem of detecting emerging words on a language, based on social networks content. It proposes an approach for detecting new words on Twitter, and reports the achieved results for a collection of 8 million Portuguese tweets. This study uses geolocated tweets, collected between January 2018 and June 2019, and written in the Portuguese territory. The first six months of the data were used to define an initial vocabulary on known words, and the following 12 months were used for identifying new words, thus testing our approach. The set of resulting words were manually analyzed, revealing a number of distinct events, and suggesting that Twitter may be a valuable resource for researching neology, and the dynamics of a language.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Natural language processing
  • Emerging words
  • Twitter
  • Portuguese language


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. David Bamman, Jacob Eisenstein, and Tyler Schnoebelen. Gender identity and lexical variation in social media. Journal of Sociolinguistics, 18(2):135-160, 2014. URL:
  2. Rebecca Blood. Weblogs: A history and perspective, September 2000. URL:
  3. Charles Boberg. Geolinguistic diffusion and the U.S. - Canada border. Language Variation and Change, 12:1-24, March 2000. URL:
  4. Marilyn Dyrud, Rebecca Worley, and Marie Flatley. Blogging for enhanced teaching and learning. Business Communication Quarterly, 68, March 2005. URL:
  5. Alexandre François. Trees, Waves and Linkages: Models of Language Diversification. In Claire Bowern and Bethwyn Evans, editors, The Routledge Handbook of Historical Linguistics, chapter Trees, Waves and Linkages: Models of Language Diversification, pages 161-189. Routledge, London, June 2014. URL:
  6. Jonathon Green and David Kendal. Writing and publishing green’s dictionary of slang. Dictionaries: Journal of the Dictionary Society of North America, 38:82-95, January 2017. URL:
  7. Jack Grieve. Dialect variation. In Douglas Biber and RandiEditors Reppen, editors, The Cambridge Handbook of English Corpus Linguistics, Cambridge Handbooks in Language and Linguistics, pages 362-380. Cambridge University Press, Cambridge (UK), 2015. URL:
  8. Jack Grieve, Andrea Nini, and Diansheng Guo. Analyzing lexical emergence in modern american english online. English Language and Linguistics, 21(1):99–127, 2017. URL:
  9. Stefan Grondelaers, Dirk Geeraerts, and Dirk Speelman. Lexical variation and change. In Dirk Geeraerts and Hubert Cuyckens, editors, The Oxford Handbook of Cognitive Linguistics, pages 988-1011. Oxford University Press, 2012. URL:
  10. Jeremy Harmer. The Practice of English Language Teaching. SERBIULA (Sistema Librum 2.0), January 2001. Google Scholar
  11. Sara Kajder, Glen Bull, and Emily Van Noy. A space for "writing without writing.". Learning and Leading with Technology, 31:32-35, 2004. URL:
  12. T. Lapa, Jorge Vieira, J. Azevedo, and G. Cardoso. As desigualdades digitais e a sociedade portuguesa: divisão, continuidades e mudanças. In Desigualdades Sociais: Portugal e a Europa, pages 257-257. Mundos Sociais, Lisboa, 2018. URL:
  13. M Lynne Murphy. Theories of lexical semantics by Dirk Geeraerts. Journal of Linguistics, 47:231-236, January 2011. URL:
  14. Dilip Mutum and Qing Wang. Consumer generated advertising in blogs. In M.S. Eastin, T. Daugherty, and N. Burns, editors, Handbook of Research on Digital Media and Advertising: User Generated Content Consumption, chapter 13, pages 248-261. IGI Global, 2010. URL:
  15. John Nerbonne. Measuring the diffusion of linguistic change. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1559):3821-3828, 2010. URL:
  16. João Pedro Pereira. Era uma vez o Twitter em Portugal. Público, 77(3):95-106, 2016. Google Scholar
  17. S.M. Shahid. Teaching of English an Introduction. Majeed Book Depot Urdu Bazar Lahore, 2002. Google Scholar
  18. Peter Trudgill. Linguistic change and diffusion: Description and explanation in sociolinguistic dialect geography. Language in Society, 3(2):215-246, 1974. URL:
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail