Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases

Author Hugo Gonçalo Oliveira



PDF
Thumbnail PDF

File

OASIcs.SLATE.2017.16.pdf
  • Filesize: 440 kB
  • 15 pages

Document Identifiers

Author Details

Hugo Gonçalo Oliveira

Cite AsGet BibTex

Hugo Gonçalo Oliveira. Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 16:1-16:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/OASIcs.SLATE.2017.16

Abstract

There are currently several lexical-semantic knowledge bases (LKBs) for Portuguese, developed by different teams and following different approaches. In this paper, the open Portuguese LKBs are briefly analysed, with a focus on size and overlapping contents, and new LKBs are created from their redundant information. Existing and new LKBs are then exploited in the performance of semantic analysis tasks and their performance is compared. Results confirm that, instead of selecting a single LKB to use, it is worth combining all the open Portuguese LKBs.
Keywords
  • Lexical Knowledge Bases
  • Portuguese
  • WordNet
  • Redundancy
  • Semantic Similarity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Eneko Agirre and Aitor Soroa. Personalizing PageRank for word sense disambiguation. In 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 33-41, 2009. Google Scholar
  2. Rajendra Banjade, Nabin Maharjan, Nobal B. Niraula, Vasile Rus, and Dipesh Gautam. Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods. In 16th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), pages 335-346, 2015. Google Scholar
  3. Anabela Barreiro. Port4NooJ: an open source, ontology-driven portuguese linguistic system with applications in machine translation. In 2008 International NooJ Conference, pages 19-47, 2010. Google Scholar
  4. Francis Bond and Ryan Foster. Linking and extending an open multilingual Wordnet. In 51st Annual Meeting of the Association for Computational Linguistics, volume 1, pages 1352-1362, August 2013. Google Scholar
  5. Rui Correia, Jorge Baptista, Maxine Eskenazi, and Nuno Mamede. Automatic generation of cloze question stems. In 10th International Conference on Computational Processing of the Portuguese Language (PROPOR), pages 168-178, April 2012. Google Scholar
  6. Gerard de Melo and Gerhard Weikum. Towards a universal wordnet by learning from combined evidence. In 18th ACM Conference on Information and Knowledge Management (CIKM), pages 513-522, 2009. Google Scholar
  7. Valeria de Paiva, Alexandre Rademaker, and Gerard de Melo. OpenWordNet-PT: An Open Brazilian WordNet for Reasoning. In 24th International Conference on Computational Linguistics, pages 353-360, 2012. Google Scholar
  8. Valeria de Paiva, Livy Real, Hugo Gonçalo Oliveira, Alexandre Rademaker, Cláudia Freitas, and Alberto Simões. An overview of Portuguese wordnets. In 8th Global WordNet Conference, pages 74-81, 2016. Google Scholar
  9. Bento C. Dias-da-Silva. Wordnet.Br: An exercise of human language technology research. In 3rd International WordNet Conference (GWC), pages 301-303, January 2006. Google Scholar
  10. Douglas Downey, Oren Etzioni, and Stephen Soderland. A probabilistic model of redundancy in information extraction. In 19th International Joint Conference on Artificial Intelligence (IJCAI), pages 1034-1041, 2005. Google Scholar
  11. Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. Language, Speech, and Communication. The MIT Press, 1998. Google Scholar
  12. Erick Rocha Fonseca, Leandro Borges dos Santos, Marcelo Criscuolo, and Sandra Maria Aluísio. Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática, 8(2):3-13, 2016. Google Scholar
  13. Dayne Freitag, Matthias Blume, John Byrnes, Edmond Chow, Sadik Kapadia, Richard Rohwer, and Zhiqiang Wang. New experiments in distributional representations of synonymy. In 9th Conference on Computational Natural Language Learning, pages 25-32, 2005. Google Scholar
  14. Hugo Gonçalo Oliveira, Inês Coelho, and Paulo Gomes. Exploiting Portuguese lexical knowledge bases for answering open domain cloze questions automatically. In 9th Language Resources and Evaluation Conference (LREC), May 2014. Google Scholar
  15. Hugo Gonçalo Oliveira, Diana Santos, Paulo Gomes, and Nuno Seco. PAPEL: A dictionary-based lexical ontology for Portuguese. In 8th International Conference on Computational Processing of the Portuguese Language (PROPOR), pages 31-40, September 2008. Google Scholar
  16. Hugo Gonçalo Oliveira. CONTO.PT: Groundwork for the automatic creation of a fuzzy portuguese wordnet. In 12th International Conference on Computational Processing of the Portuguese Language (PROPOR), pages 283-295, July 2016. Google Scholar
  17. Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer, and Christian Wirth. UBY - a large-scale unified lexical-semantic resource. In 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 580-590, April 2012. Google Scholar
  18. Marti A. Hearst. Automated discovery of WordNet relations. In Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database, Language, Speech, and Communication, pages 131-151. The MIT Press, 1998. Google Scholar
  19. Felix Hill, Roi Reichart, and Anna Korhonen. Simlex-999: Evaluating semantic models with genuine similarity estimation. Computational Linguistics, 41(4):665-695, December 2015. Google Scholar
  20. Bernardo Magnini and Gabriela Cavaglià. Integrating subject field codes into WordNet. In 2nd International Conference on Language Resources and Evaluation (LREC), pages 1413-1418, 2000. Google Scholar
  21. Palmira Marrafa. Portuguese WordNet: general architecture and internal semantic relations. DELTA, 18:131-146, 2002. Google Scholar
  22. Erick Maziero, Thiago Pardo, Ariani Di Felippo, and Bento Dias-da-Silva. A base de dados lexical e a interface web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In VI Workshop em Tecnologia da Informação e Linguagem Humana, pages 390-392, 2008. Google Scholar
  23. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In Workshop Track of the International Conference on Learning Representations (ICLR), 2013. Google Scholar
  24. Roberto Navigli and Simone Paolo Ponzetto. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217-250, 2012. Google Scholar
  25. Emanuele Pianta, Luisa Bentivogli, and Christian Girardi. MultiWordNet: developing an aligned multilingual database. In 1st International Conference on Global WordNet, pages 293-302, 2002. Google Scholar
  26. Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. Align, disambiguate and walk: A unified approach for measuring semantic similarity. In 51st Annual Meeting of the Association for Computational Linguistics (ACL), pages 1341-1351, 2013. Google Scholar
  27. João António Rodrigues, António Branco, Steven Neale, and João Ricardo Silva. LX-DSemVectors: Distributional semantics models for Portuguese. In 12th International Conference on the Computational Processing of the Portuguese Language (PROPOR), pages 259-270, 2016. Google Scholar
  28. Diana Santos and Eckhard Bick. Providing internet access to Portuguese corpora: the AC/DC project. In 2nd International Conference on Language Resources and Evaluation (LREC), pages 205-210, 2000. Google Scholar
  29. Lei Shi and Rada Mihalcea. Putting pieces together: Combining FrameNet, VerbNet and WordNet for robust semantic parsing. In Computational Linguistics and Intelligent Text Processing (CICLing), pages 100-111, 2005. Google Scholar
  30. Alberto Simões and Xavier Gómez Guinovart. Bootstrapping a Portuguese wordnet from Galician, Spanish and English wordnets. In Advances in Speech and Language Technologies for Iberian Languages, volume 8854 of LNCS, pages 239-248, 2014. Google Scholar
  31. Alberto Simões, Álvaro Iriarte Sanromán, and José João Almeida. Dicionário-Aberto: A source of resources for the Portuguese language processing. In 10th International Conference on Computational Processing of the Portuguese Language (PROPOR), pages 121-127, April 2012. Google Scholar
  32. Piek Vossen. EuroWordNet: a multilingual database for information retrieval. In DELOS workshop on Cross-Language Information Retrieval, 1997. Google Scholar
  33. Rodrigo Wilkens, Leonardo Zilio, Eduardo Ferreira, and Aline Villavicencio. B2SG: a TOEFL-like task for Portuguese. In 10th International Conference on Language Resources and Evaluation (LREC), pages 3659-3662, May 2016. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail