Acquiring Domain-Specific Knowledge for WordNet from a Terminological Database

Authors Alberto Simões , Xavier Gómez Guinovart



PDF
Thumbnail PDF

File

OASIcs.SLATE.2019.6.pdf
  • Filesize: 440 kB
  • 13 pages

Document Identifiers

Author Details

Alberto Simões
  • 2Ai - Polytechnic Institute of Cávado and Ave, 4750 - 810 Barcelos, Portugal
Xavier Gómez Guinovart
  • SLI-TALG, Universidade de Vigo, Vigo, Galiza

Cite AsGet BibTex

Alberto Simões and Xavier Gómez Guinovart. Acquiring Domain-Specific Knowledge for WordNet from a Terminological Database. In 8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 6:1-6:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.SLATE.2019.6

Abstract

In this research we explore a terminological database (Termoteca) in order to expand the Portuguese and Galician wordnets (PULO and Galnet) with the addition of new synset variants (word forms for a concept), usage examples for the variants, and synset glosses or definitions. The methodology applied in this experiment is based on the alignment between concepts of WordNet (synsets) and concepts described in Termoteca (terminological records), taking into account the lexical forms in both resources, their morphological category and their knowledge domains, using the information provided by the WordNet Domains Hierarchy and the Termoteca field domains to reduce the incidence of polysemy and homography in the results of the experiment. The results obtained confirm our hypothesis that the combined use of the semantic domain information included in both resources makes it possible to minimise the problem of lexical ambiguity and to obtain a very acceptable index of precision in terminological information extraction tasks, attaining a precision above 89% when there are two or more different languages sharing at least one lexical form between the synset in Galnet and the Termoteca record.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Language resources
  • Computing methodologies → Natural language processing
Keywords
  • WordNet
  • Terminology
  • Lexical Resources
  • Natural Language Processing

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Rodrigo Agerri, Xavier Gómez Guinovart, German Rigau, and Miguel Anxo Solla Portela. Developing New Linguistic Resources and Tools for the Galician Language. In Eleventh International Conference on Language Resources and Evaluation (LREC), pages 2322-2325, 2018. Google Scholar
  2. Purya Aliabadi, Mohamed Sina Ahmadi, Shahin Salavati, and Kyumars Sheykh Esmaili. Towards building KurdNet, the Kurdish WordNet. In 7th Global Wordnet Conference (GWC), pages 1-6, 2014. Google Scholar
  3. María Álvarez de la Granja, Xosé María Gómez Clemente, and Xavier Gómez Guinovart. Introducing Idioms in the Galician WordNet: Methods, Problems and Results. Open Linguistics, 2(1):253-286, 2016. URL: http://dx.doi.org/10.1515/opli-2016-0012.
  4. Alberto Álvarez Lugrís and Xavier Gómez Guinovart. Lexicografía bilingüe práctica basada en corpus: planificación y elaboración del Dicionario Moderno Inglés-Galego. In Lexicografía de las lenguas románicas: Aproximaciones a la lexicografía moderna y contrastiva, pages 31-48, 2014. URL: http://dx.doi.org/10.1515/9783110310337.31.
  5. Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Antoni Oliver, and German Rigau. Consistent Annotation of EuroWordNet with the Top Concept Ontology. In 4th Global WordNet Conference (GWC), 2008. Google Scholar
  6. Jordi Atserias, Salvador Climent, Xavier Farreres, German Rigau, and Horacio Rodriguez. Combining multiple methods for the automatic construction of multilingual WordNets. In Recent Advances in Natural Language Processing II. Selected papers (RANLP), volume 97, pages 327-338, 1997. Google Scholar
  7. Luisa Bentivogli, Pamela Forner, Bernardo Magnini, and Emanuele Pianta. Revising WordNet Domains Hierarchy: Semantics, Coverage, and Balancing. In COLING Workshop on Multilingual Linguistic Resources, pages 101-108, 2004. Google Scholar
  8. Laura Benítez, Sergi Cervell, Gerard Escudero, Mònica López, German Rigau, and Mariona Taulé. Methods and tools for building the Catalan WordNet. In ELRA Workshop on Language Resources for European Minority Languages, 1998. Google Scholar
  9. Sudha Bhingardive, Tanuja Ajotikar, Irawati Kulkarni, Malhar Kulkarni, and Pushpak Bhattacharyya. Semi-Automatic Extension of Sanskrit Wordnet using Bilingual Dictionary. In 7th Global Wordnet Conference (GWC), pages 324-329, 2014. Google Scholar
  10. João Malaca Casteleiro, editor. Dicionário da Língua Portuguesa Contemporânea. Academia das Ciências de Lisboa, Verbo, 2006. Google Scholar
  11. Valeria de Paiva, Livy Real, Hugo Gonçalo Oliveira, Alexandre Rademaker, Cláudia Freitas, and Alberto Simões. An overview of Portuguese WordNets. In Verginica Barbu Mititelu, Corina Forăscu, Christiane Fellbaum, and Piek Vossen, editors, 8th Global WordNet Conference (GWC2016), pages 74-81, 2016. Google Scholar
  12. Christiane Fellbaum, editor. WordNet: An electronic lexical database. MIT Press, Cambridge, 1998. Google Scholar
  13. Xosé María Gómez Clemente, Xavier Gómez Guinovart, and Alberto Simões. Dicionario de sinónimos do galego. Xerais, Vigo, 2015. Google Scholar
  14. Xavier Gómez Guinovart. A hybrid corpus-based approach to bilingual terminology extraction. In Encoding the past, decoding the future: corpora in the 21st Century, pages 147-175, 2012. Google Scholar
  15. Xavier Gómez Guinovart. Do dicionario de sinónimos á rede semántica: fontes lexicográficas na construción do WordNet do Galego. In XV Colóquio de Outono: As humanidades e as ciências: disjunções e confluências, pages 331-358, 2014. Google Scholar
  16. Xavier Gómez Guinovart. Enriching parallel corpora with multimedia and lexical semantics: From the CLUVI Corpus to WordNet and SemCor. In Parallel Corpora for Contrastive and Translation Studies: New resources and applications, pages 141-158. John Benjamins, Amsterdam, 2019. URL: http://dx.doi.org/10.1075/scl.90.09gom.
  17. Xavier Gómez Guinovart and Antoni Oliver. Methodology and evaluation of the Galician WordNet expansion with the WN-Toolkit. Procesamiento del Lenguaje Natural, 53:43-50, 2014. Google Scholar
  18. Xavier Gómez Guinovart and Miguel Anxo Solla Portela. O dicionario de sinónimos como recurso para a expansión de WordNet. Linguamática, 6(2):69-74, 2014. Google Scholar
  19. Xavier Gómez Guinovart and Miguel Anxo Solla Portela. Building the Galician wordnet: methods and applications. Language Resources and Evaluation, 52(1):317-339, 2018. URL: http://dx.doi.org/10.1007/s10579-017-9408-5.
  20. Aitor Gonzalez-Agirre, Egoitz Laparra, and German Rigau. Multilingual Central Repository version 3.0. In 8th International Conference on Language Resources and Evaluation (LREC), pages 2525-2529, 2012. Google Scholar
  21. Sopan Govind Kolte and Sunil G. Bhirud. Word Sense Disambiguation Using WordNet Domains. In 1st International Conference on Emerging Trends in Engineering and Technology, pages 1187-1191, July 2008. URL: http://dx.doi.org/10.1109/ICETET.2008.231.
  22. Matea Filko Krešimir Šojat and Antoni Oliver. Further expansion of the Croatian WordNet. In 9th Global WordNet Conference (GWC), 2018. Google Scholar
  23. Wei Jan Lee and Edwin Mit. Word Sense Disambiguation by using domain knowledge. In International Conference on Semantic Technology and Information Retrieval, pages 237-242, June 2011. URL: http://dx.doi.org/10.1109/STAIR.2011.5995795.
  24. Angela Locoro, Daniele Grignani, and Viviana Mascardi. When You Doubt, Abstain: From Misclassification to Epoché in Automatic Text Categorisation. In IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, volume 3, pages 209-212, August 2011. URL: http://dx.doi.org/10.1109/WI-IAT.2011.65.
  25. Márton Miháltz, Csaba Hatvani, Judit Kuti, György Szarvas, János Csirik, Gábor Prószéky, and Tamás Váradi. Methods and results of the Hungarian wordnet project. In 4th Global WordNet Conference, pages 387-405, 2008. Google Scholar
  26. George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. WordNet: An on-line lexical database. International Journal of Lexicography, 3:235-244, 1990. Google Scholar
  27. Khalil Mrini and Francis Bond. Building the Moroccan Darija WordNet (MDW) using Bilingual Resources. In International Conference on Natural Language, Signal and Speech Processing (ICNLSSP), 2017. Google Scholar
  28. Antoni Oliver. WN-Toolkit: Automatic generation of wordnets following the expand model. In 7th Global Wordnet Conference (GWC), pages 7-15, 2014. Google Scholar
  29. Adam Pease, Ian Niles, and John Li. The Suggested Upper Merged Ontology: A Large Ontology for the Semantic Web and its Applications. In Working Notes of the AAAI-2002 Workshop on Ontologies and the Semantic Web, 2002. Google Scholar
  30. Emanuele Pianta, Luisa Bentivogli, and Christian Girardi. MultiWordNet. developing an aligned multilingual database. In 1st International WordNet Conference, pages 293-302, 2002. Google Scholar
  31. Elisabete Pociello, Eneko Agirre, and Izaskun Aldezaba. Methodology and Construction of the Basque WordNet. Language Resources and Evaluation, 45(2):121-142, 2011. URL: http://dx.doi.org/10.1007/s10579-010-9131-y.
  32. Quentin Pradet, Gaël de Chalendar, and Jaume Baguenier Desormeaux. WoNeF, an improved, expanded and evaluated automatic French translation of WordNet. In 7th Global WordNet Conference (GWC), pages 32-39, 2014. Google Scholar
  33. Desmond Darma Putra, Abdul Arfan, and Ruli Manurung. Building an Indonesian WordNet. In 2nd International MALINDO Workshop, 2008. Google Scholar
  34. Ida Raffaeli, Bekavac Božo, Željko Agić, and Marko Tadić. Building Croatian WordNet. In 4th Global WordNet Conference (GWC), pages 349-359, 2008. Google Scholar
  35. K.M. Tahsin Rahit, Tabin Hasan, Md.Al Amin, and Zahiduddin Ahmed. BanglaNet: Towards a WordNet for Bengali language. In 9th Global WordNet Conference (GWC), 2018. Google Scholar
  36. Benoît Sagot and Darja Fišer. Building a free French wordnet from multilingual resources. In OntoLex, pages 14-19, 2008. Google Scholar
  37. Alberto Simões and Xavier Gómez Guinovart. Bootstrapping a Portuguese WordNet from Galician, Spanish and English wordnets. In Advances in Speech and Language Technologies for Iberian Languages, volume 8854 of Lecture Notes in Computer Science, pages 239-248, 2014. Google Scholar
  38. Alberto Simões and Xavier Gómez Guinovart. Extending the Galician wordnet using a multilingual Bible through lexical alignment and semantic annotation. In 7th Symposium on Languages, Applications and Technologies (SLATE 2018), volume 62 of OpenAccess Series in Informatics (OASIcs), pages 14:1-14:13, Dagstuhl, Germany, 2018. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: http://dx.doi.org/10.4230/OASIcs.SLATE.2018.14.
  39. Alberto Simões and José João Almeida. Experiments on Enlarging a Lexical Ontology. In Languages, Applications and Technologies, volume 563 of Communications in Computer and Information Science, pages 49-56. Springer International Publishing, 2015. URL: http://dx.doi.org/10.1007/978-3-319-27653-3_5.
  40. Alberto Simões, Xavier Gómez Guinovart, and José João Almeida. Enriching a Portuguese WordNet using Synonyms from a Monolingual Dictionary. In 9th International Conference on Language Resources and Evaluation (LREC), May 2016. Google Scholar
  41. Miguel Anxo Solla Portela and Xavier Gómez Guinovart. Ampliación de WordNet mediante extracción léxica a partir de un diccionario de sinónimos. In Actas de las V Jornadas de la Red en Tratamiento de la Información Multilingüe y Multimodal, volume 1199, pages 29-32. CEUR Workshop Proceedings (CEUR-WS.org), 2014. Google Scholar
  42. Piek Vossen, editor. EuroWordNet: A multilingual database with lexical semantic networks. Kluwer Academic Publishers, Norwell, 1998. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail