Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases

Gonçalo Oliveira, Hugo

doi:10.4230/OASIcs.SLATE.2017.16

Abstract

There are currently several lexical-semantic knowledge bases (LKBs) for Portuguese, developed by different teams and following different approaches. In this paper, the open Portuguese LKBs are briefly analysed, with a focus on size and overlapping contents, and new LKBs are created from their redundant information. Existing and new LKBs are then exploited in the performance of semantic analysis tasks and their performance is compared. Results confirm that, instead of selecting a single LKB to use, it is worth combining all the open Portuguese LKBs.

Eneko Agirre and Aitor Soroa. Personalizing PageRank for word sense disambiguation. In 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 33-41, 2009.
Rajendra Banjade, Nabin Maharjan, Nobal B. Niraula, Vasile Rus, and Dipesh Gautam. Lemon and tea are not similar: Measuring word-to-word similarity by combining different methods. In 16th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), pages 335-346, 2015.
Anabela Barreiro. Port4NooJ: an open source, ontology-driven portuguese linguistic system with applications in machine translation. In 2008 International NooJ Conference, pages 19-47, 2010.
Francis Bond and Ryan Foster. Linking and extending an open multilingual Wordnet. In 51st Annual Meeting of the Association for Computational Linguistics, volume 1, pages 1352-1362, August 2013.
Rui Correia, Jorge Baptista, Maxine Eskenazi, and Nuno Mamede. Automatic generation of cloze question stems. In 10th International Conference on Computational Processing of the Portuguese Language (PROPOR), pages 168-178, April 2012.
Gerard de Melo and Gerhard Weikum. Towards a universal wordnet by learning from combined evidence. In 18th ACM Conference on Information and Knowledge Management (CIKM), pages 513-522, 2009.
Valeria de Paiva, Alexandre Rademaker, and Gerard de Melo. OpenWordNet-PT: An Open Brazilian WordNet for Reasoning. In 24th International Conference on Computational Linguistics, pages 353-360, 2012.
Valeria de Paiva, Livy Real, Hugo Gonçalo Oliveira, Alexandre Rademaker, Cláudia Freitas, and Alberto Simões. An overview of Portuguese wordnets. In 8th Global WordNet Conference, pages 74-81, 2016.
Bento C. Dias-da-Silva. Wordnet.Br: An exercise of human language technology research. In 3rd International WordNet Conference (GWC), pages 301-303, January 2006.
Douglas Downey, Oren Etzioni, and Stephen Soderland. A probabilistic model of redundancy in information extraction. In 19th International Joint Conference on Artificial Intelligence (IJCAI), pages 1034-1041, 2005.
Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. Language, Speech, and Communication. The MIT Press, 1998.
Erick Rocha Fonseca, Leandro Borges dos Santos, Marcelo Criscuolo, and Sandra Maria Aluísio. Visão geral da avaliação de similaridade semântica e inferência textual. Linguamática, 8(2):3-13, 2016.
Dayne Freitag, Matthias Blume, John Byrnes, Edmond Chow, Sadik Kapadia, Richard Rohwer, and Zhiqiang Wang. New experiments in distributional representations of synonymy. In 9th Conference on Computational Natural Language Learning, pages 25-32, 2005.
Hugo Gonçalo Oliveira, Inês Coelho, and Paulo Gomes. Exploiting Portuguese lexical knowledge bases for answering open domain cloze questions automatically. In 9th Language Resources and Evaluation Conference (LREC), May 2014.
Hugo Gonçalo Oliveira, Diana Santos, Paulo Gomes, and Nuno Seco. PAPEL: A dictionary-based lexical ontology for Portuguese. In 8th International Conference on Computational Processing of the Portuguese Language (PROPOR), pages 31-40, September 2008.
Hugo Gonçalo Oliveira. CONTO.PT: Groundwork for the automatic creation of a fuzzy portuguese wordnet. In 12th International Conference on Computational Processing of the Portuguese Language (PROPOR), pages 283-295, July 2016.
Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer, and Christian Wirth. UBY - a large-scale unified lexical-semantic resource. In 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 580-590, April 2012.
Marti A. Hearst. Automated discovery of WordNet relations. In Christiane Fellbaum, editor, WordNet: An Electronic Lexical Database, Language, Speech, and Communication, pages 131-151. The MIT Press, 1998.
Felix Hill, Roi Reichart, and Anna Korhonen. Simlex-999: Evaluating semantic models with genuine similarity estimation. Computational Linguistics, 41(4):665-695, December 2015.
Bernardo Magnini and Gabriela Cavaglià. Integrating subject field codes into WordNet. In 2nd International Conference on Language Resources and Evaluation (LREC), pages 1413-1418, 2000.
Palmira Marrafa. Portuguese WordNet: general architecture and internal semantic relations. DELTA, 18:131-146, 2002.
Erick Maziero, Thiago Pardo, Ariani Di Felippo, and Bento Dias-da-Silva. A base de dados lexical e a interface web do TeP 2.0 - Thesaurus Eletrônico para o Português do Brasil. In VI Workshop em Tecnologia da Informação e Linguagem Humana, pages 390-392, 2008.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In Workshop Track of the International Conference on Learning Representations (ICLR), 2013.
Roberto Navigli and Simone Paolo Ponzetto. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217-250, 2012.
Emanuele Pianta, Luisa Bentivogli, and Christian Girardi. MultiWordNet: developing an aligned multilingual database. In 1st International Conference on Global WordNet, pages 293-302, 2002.
Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. Align, disambiguate and walk: A unified approach for measuring semantic similarity. In 51st Annual Meeting of the Association for Computational Linguistics (ACL), pages 1341-1351, 2013.
João António Rodrigues, António Branco, Steven Neale, and João Ricardo Silva. LX-DSemVectors: Distributional semantics models for Portuguese. In 12th International Conference on the Computational Processing of the Portuguese Language (PROPOR), pages 259-270, 2016.
Diana Santos and Eckhard Bick. Providing internet access to Portuguese corpora: the AC/DC project. In 2nd International Conference on Language Resources and Evaluation (LREC), pages 205-210, 2000.
Lei Shi and Rada Mihalcea. Putting pieces together: Combining FrameNet, VerbNet and WordNet for robust semantic parsing. In Computational Linguistics and Intelligent Text Processing (CICLing), pages 100-111, 2005.
Alberto Simões and Xavier Gómez Guinovart. Bootstrapping a Portuguese wordnet from Galician, Spanish and English wordnets. In Advances in Speech and Language Technologies for Iberian Languages, volume 8854 of LNCS, pages 239-248, 2014.
Alberto Simões, Álvaro Iriarte Sanromán, and José João Almeida. Dicionário-Aberto: A source of resources for the Portuguese language processing. In 10th International Conference on Computational Processing of the Portuguese Language (PROPOR), pages 121-127, April 2012.
Piek Vossen. EuroWordNet: a multilingual database for information retrieval. In DELOS workshop on Cross-Language Information Retrieval, 1997.
Rodrigo Wilkens, Leonardo Zilio, Eduardo Ferreira, and Aline Villavicencio. B2SG: a TOEFL-like task for Portuguese. In 10th International Conference on Language Resources and Evaluation (LREC), pages 3659-3662, May 2016.

Comparing and Combining Portuguese Lexical-Semantic Knowledge Bases

Author Hugo Gonçalo Oliveira

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References