License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.LDK.2021.21
URN: urn:nbn:de:0030-drops-145578
URL: https://drops.dagstuhl.de/opus/volltexte/2021/14557/
Go to the corresponding OASIcs Volume Portal


Gonçalo Oliveira, Hugo ; Aguiar, Fredson Silva de Souza ; Rademaker, Alexandre

On the Utility of Word Embeddings for Enriching OpenWordNet-PT

pdf-format:
OASIcs-LDK-2021-21.pdf (0.6 MB)


Abstract

The maintenance of wordnets and lexical knwoledge bases typically relies on time-consuming manual effort. In order to minimise this issue, we propose the exploitation of models of distributional semantics, namely word embeddings learned from corpora, in the automatic identification of relation instances missing in a wordnet. Analogy-solving methods are first used for learning a set of relations from analogy tests focused on each relation. Despite their low accuracy, we noted that a portion of the top-given answers are good suggestions of relation instances that could be included in the wordnet. This procedure is applied to the enrichment of OpenWordNet-PT, a public Portuguese wordnet. Relations are learned from data acquired from this resource, and illustrative examples are provided. Results are promising for accelerating the identification of missing relation instances, as we estimate that about 17% of the potential suggestions are good, a proportion that almost doubles if some are automatically invalidated.

BibTeX - Entry

@InProceedings{goncalooliveira_et_al:OASIcs.LDK.2021.21,
  author =	{Gon\c{c}alo Oliveira, Hugo and Aguiar, Fredson Silva de Souza and Rademaker, Alexandre},
  title =	{{On the Utility of Word Embeddings for Enriching OpenWordNet-PT}},
  booktitle =	{3rd Conference on Language, Data and Knowledge (LDK 2021)},
  pages =	{21:1--21:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-199-3},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{93},
  editor =	{Gromann, Dagmar and S\'{e}rasset, Gilles and Declerck, Thierry and McCrae, John P. and Gracia, Jorge and Bosque-Gil, Julia and Bobillo, Fernando and Heinisch, Barbara},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/14557},
  URN =		{urn:nbn:de:0030-drops-145578},
  doi =		{10.4230/OASIcs.LDK.2021.21},
  annote =	{Keywords: word embeddings, lexical resources, wordnet, analogy tests}
}

Keywords: word embeddings, lexical resources, wordnet, analogy tests
Collection: 3rd Conference on Language, Data and Knowledge (LDK 2021)
Issue Date: 2021
Date of publication: 30.08.2021


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI