Exploring Different Methods for Solving Analogies with Portuguese Word Embeddings

Authors Tiago Sousa, Hugo Gonçalo Oliveira , Ana Alves



PDF
Thumbnail PDF

File

OASIcs.SLATE.2020.9.pdf
  • Filesize: 420 kB
  • 14 pages

Document Identifiers

Author Details

Tiago Sousa
  • ISEC, Polytechnic Institute of Coimbra, Portugal
Hugo Gonçalo Oliveira
  • CISUC, Department of Informatics Engineering, University of Coimbra, Portugal
Ana Alves
  • CISUC, University of Coimbra, Portugal
  • ISEC, Polytechnic Institute of Coimbra, Portugal

Cite AsGet BibTex

Tiago Sousa, Hugo Gonçalo Oliveira, and Ana Alves. Exploring Different Methods for Solving Analogies with Portuguese Word Embeddings. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 9:1-9:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/OASIcs.SLATE.2020.9

Abstract

A common way of assessing static word embeddings is to use them for solving analogies of the kind "what is to king as man is to woman?". For this purpose, the vector offset method (king - man + woman = queen), also known as 3CosAdd, has been effectively used for solving analogies and assessing different models of word embeddings in different languages. However, some researchers pointed out that this method is not the most effective for this purpose. Following this, we tested alternative analogy solving methods (3CosMul, 3CosAvg, LRCos) in Portuguese word embeddings and confirmed the previous statement. Specifically, those methods are used to answer the Portuguese version of the Google Analogy Test, dubbed LX-4WAnalogies, which covers syntactic and semantic analogies of different kinds. We discuss the accuracy of different methods applied to different models of embeddings and take some conclusions. Indeed, all methods outperform 3CosAdd, and the best performance is consistently achieved with LRCos, in GloVe.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Lexical semantics
Keywords
  • analogies
  • word embeddings
  • semantic relations
  • syntactic relations
  • Portuguese

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Benjamin Bay, Paul Bodily, and Dan Ventura. Text transformation via constraints and word embedding. In Proc. 8th International Conference on Computational Creativity, ICCC 2017, pages 49-56, 2017. Google Scholar
  2. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135-146, 2017. Google Scholar
  3. Zied Bouraoui, Shoaib Jameel, and Steven Schockaert. Relation induction in word embeddings revisited. In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, pages 1627-1637, Santa Fe, New Mexico, USA, August 2018. ACL. Google Scholar
  4. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proc. of Human Language Technologies, Vol 1, NAACL-HLT 2019, pages 4171-4186. ACL, 2019. Google Scholar
  5. Aleksandr Drozd, Anna Gladkova, and Satoshi Matsuoka. Word embeddings, analogies, and machine learning: Beyond king - man + woman = queen. In Proceedings the 26th International Conference on Computational Linguistics: Technical papers (COLING 2016), COLING 2016, pages 3519-3530, 2016. Google Scholar
  6. Kawin Ethayarajh. How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 55-65, 2019. Google Scholar
  7. Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, 1998. Google Scholar
  8. Anna Gladkova, Aleksandr Drozd, and Satoshi Matsuoka. Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn't. In Proceedings of the NAACL 2016 Student Research Workshop, pages 8-15. ACL, 2016. Google Scholar
  9. Hugo Gonçalo Oliveira, Tiago Sousa, and Ana Alves. Tales: Test set of portuguese lexical-semantic relations for assessing word embeddings. In Proceedings of the ECAI 2020 Workshop on Hybrid Intelligence for Natural Language Processing Tasks (HI4NLP), page In press, 2020. Google Scholar
  10. Zelig Harris. Distributional structure. Word, 10(2-3):1456-1162, 1954. Google Scholar
  11. Nathan S. Hartmann, Erick R. Fonseca, Christopher D. Shulby, Marcos V. Treviso, Jéssica S. Rodrigues, and Sandra M. Aluísio. Portuguese word embeddings: Evaluating on word analogies and natural language tasks. In Proceedings 11th Brazilian Symposium in Information and Human Language Technology (STIL 2017), 2017. Google Scholar
  12. Omer Levy and Yoav Goldberg. Linguistic regularities in sparse and explicit word representations. In Proceedings of 18th Conference on Computational Natural Language Learning, CoNLL 2014, pages 171-180. ACL, 2014. Google Scholar
  13. Tal Linzen. Issues in evaluating semantic spaces using word analogies. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pages 13-18, Berlin, Germany, August 2016. ACL. Google Scholar
  14. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In Proceedings of the Workshop track of ICLR, 2013. Google Scholar
  15. Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746-751, Atlanta, Georgia, June 2013. ACL. Google Scholar
  16. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, pages 1532-1543. ACL, 2014. Google Scholar
  17. Andreia Querido, Rita Carvalho, João Rodrigues, Marcos Garcia, João Silva, Catarina Correia, Nuno Rendeiro, Rita Pereira, Marisa Campos, and António Branco. LX-LR4DistSemEval: a collection of language resources for the evaluation of distributional semantic models of Portuguese. Revista da Associação Portuguesa de Linguística, 3:265-283, 2017. Google Scholar
  18. João Rodrigues, António Branco, Steven Neale, and João Ricardo Silva. LX-DSemVectors: Distributional semantics models for Portuguese. In Proceedings of 12th International Conference on the Computational Processing of the Portuguese Language PROPOR, volume 9727 of LNCS, pages 259-270, Tomar, Portugal, 2016. Springer. Google Scholar
  19. Robert Speer, Joshua Chin, and Catherine Havasi. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of Thirty-First Conference on Artificial Intelligence (AAAI), pages 4444-4451, 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail