Extending the Galician Wordnet Using a Multilingual Bible Through Lexical Alignment and Semantic Annotation

Authors Alberto Simões , Xavier Gómez Guinovart



PDF
Thumbnail PDF

File

OASIcs.SLATE.2018.14.pdf
  • Filesize: 447 kB
  • 13 pages

Document Identifiers

Author Details

Alberto Simões
  • Applied Artificial Intelligence Lab (2Ai Lab), Instituto Politécnico do Cávado e do Ave, Barcelos, Portugal
Xavier Gómez Guinovart
  • Galician Language Technology and Applications (TALG Group), Universidade de Vigo, Galiza, Spain

Cite As Get BibTex

Alberto Simões and Xavier Gómez Guinovart. Extending the Galician Wordnet Using a Multilingual Bible Through Lexical Alignment and Semantic Annotation. In 7th Symposium on Languages, Applications and Technologies (SLATE 2018). Open Access Series in Informatics (OASIcs), Volume 62, pp. 14:1-14:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/OASIcs.SLATE.2018.14

Abstract

In this paper we describe the methodology and evaluation of the expansion of Galnet - the Galician wordnet - using a multilingual Bible through lexical alignment and semantic annotation. For this experiment we used the Galician, Portuguese, Spanish, Catalan and English versions of the Bible. They were annotated with part-of-speech and WordNet sense using FreeLing. The resulting synsets were aligned, and new variants for the Galician language were extracted. After manual evaluation the approach presented a 96.8% accuracy.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Language resources
  • Computing methodologies → Lexical semantics
Keywords
  • WordNet
  • lexical acquisition
  • parallel corpora
  • semantic annotation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Eneko Agirre and Aitor Soroa. Personalizing PageRank for Word Sense Disambiguation. In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 33-41, 2009. Google Scholar
  2. Purya Aliabadi, Mohamed Sina Ahmadi, Shahin Salavati, and Kyumars Sheykh Esmaili. Towards building KurdNet, the Kurdish WordNet. In Proceedings of the 7th Global WordNetConference, Tartu, Estonia, 2014. Google Scholar
  3. José João Almeida, Sílvia Araújo, Nuno Carvalho, Idalete Dias, Ana Oliveira, André Santos, and Alberto Simões. The Per-Fide Corpus: A new resource for corpus-based terminology, contrastive linguistics and translation studies. In Tony Berber Sardinha and Telma de Lurdes São Bento Ferreira, editors, Working with Portuguese Corpora, pages 177-200, London, 2014. Bloomsbury Publishing. Google Scholar
  4. Jordi Atserias, Salvador Climent, Xavier Farreres, German Rigau, and Horacio Rodriguez. Combining multiple methods for the automatic construction of multilingual WordNets. In Recent Advances in Natural Language Processing II. Selected papers from RANLP, volume 97, pages 327-338, 1997. Google Scholar
  5. Laura Benítez, Sergi Cervell, Gerard Escudero, Mònica López, German Rigau, and Mariona Taulé. Methods and tools for building the Catalan WordNet. In In Proceedings of the ELRA Workshop on Language Resources for European Minority Languages, 1998. Google Scholar
  6. Christos Christodouloupoulos and Mark Steedman. A massively parallel corpus: the Bible in 100 languages. Language Resources and Evaluation, 49(2):375-395, 2015. Google Scholar
  7. Christiane Fellbaum, editor. WordNet: An electronic lexical database. MIT Press, Cambridge, 1998. Google Scholar
  8. Xavier Gómez Guinovart and Antoni Oliver. Methodology and evaluation of the Galician WordNet expansion with the WN-Toolkit. Procesamiento del Lenguaje Natural, 53:43-50, 2014. Google Scholar
  9. Xavier Gómez Guinovart and Miguel Anxo Solla Portela. Building the Galician wordnet: methods and applications. Language Resources and Evaluation, 52, 2017. URL: http://dx.doi.org/10.1007/s10579-017-9408-5.
  10. Hugo Gonçalo Oliveira. Onto.PT: Towards the Automatic Construction of a Lexical Ontology for Portuguese. Tese de doutoramento, Universidade de Coimbra, 2013. URL: http://eden.dei.uc.pt/~hroliv/pubs/GoncaloOliveira_PhdThesis2012.pdf.
  11. Hugo Gonçalo Oliveira and Paulo Gomes. ECO and Onto.PT: a flexible approach for creating a Portuguese wordnet automatically. Language Resources and Evaluation, 48(2):373-393, 2014. URL: http://dx.doi.org/10.1007/s10579-013-9249-9.
  12. Aitor Gonzalez-Agirre, Egoitz Laparra, and German Rigau. Multilingual Central Repository version 3.0. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, 2012. ELRA. Google Scholar
  13. Vladimir Levenshtein. Binary Codes Capable of Correcting Deletions and Insertions and Reversals. Soviet Physics Doklady, 10(8):707-710, 1966. Google Scholar
  14. Márton Miháltz, Csaba Hatvani, Judit Kuti, György Szarvas, János Csirik, Gábor Prószéky, and Tamás Váradi. Methods and results of the Hungarian wordnet project. In Proceedings of the Fourth Global WordNet Conference. GWC, pages 387-405, Szeged, Hungary, 2008. Google Scholar
  15. George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. WordNet: An on-line lexical database. International Journal of Lexicography, 3:235-244, 1990. Google Scholar
  16. Antoni Oliver. WN-Toolkit: Automatic generation of wordnets following the expand model. In Proceedings of the 7th Global WordNet Conference, Tartu, 2014. GWN. Google Scholar
  17. Lluís Padró and Evgeny Stanilovsky. Freeling 3.0: Towards wider multilinguality. In Proceedings of the Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey, May 2012. ELRA. Google Scholar
  18. Emanuele Pianta, Luisa Bentivogli, and Christian Girardi. MultiWordNet. developing an aligned multilingual database. In 1st International WordNet Conference, pages 293-302, Mysore, India, 2002. Google Scholar
  19. Elisabete Pociello, Eneko Agirre, and Izaskun Aldezaba. Methodology and construction of the Basque WordNet. Language Resources and Evaluation, 45(2):121-142, 2011. URL: http://dx.doi.org/10.1007/s10579-010-9131-y.
  20. Quentin Pradet, Gaël de Chalendar, and Jaume Baguenier Desormeaux. WoNeF, an improved, expanded and evaluated automatic French translation of WordNet. In Proceedings of the 7th Global WordNetConference, Tartu, Estonia, 2014. Google Scholar
  21. Desmond Darma Putra, Abdul Arfan, and Ruli Manurung. Building an Indonesian WordNet. In Proceedings of the 2nd International MALINDO Workshop, 2008. Google Scholar
  22. Ida Raffaeli, Bekavac Božo, Željko Agić, and Marko Tadić. Building Croatian WordNet. In Proceedings of the 4th Global WordNet Conference, Szeged, Hungary, 2014. Google Scholar
  23. Real Academia Galega. Normas ortográficas e morfolóxicas do idioma galego. Editorial Galaxia, Vigo, 2004. Google Scholar
  24. Philip Resnik, Mari Broman Olsen, and Mona Diab. The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’. Computers and the Humanities, 33(1-2):129-153, 1999. Google Scholar
  25. Benoît Sagot and Darja Fišer. Building a free French wordnet from multilingual resources. In Proceedings of OntoLex, 2008. Google Scholar
  26. Alberto Simões and Xavier Gómez Guinovart. Bootstrapping a Portuguese WordNet from Galician, Spanish and English wordnets. In Advances in Speech and Language Technologies for Iberian Languages, volume 8854 of Lecture Notes in Computer Science, pages 239-248, Berlin, 2014. Springer. Google Scholar
  27. Alberto Simões and José João Almeida. NATools - a statistical word aligner workbench. Procesamiento del Lenguaje Natural, 31:217-224, September 2003. Google Scholar
  28. Alberto Simões and Xavier Gómez Guinovart. Dictionary Alignment by Rewrite-based Entry Translation. In José Paulo Leal, Ricardo Rocha, and Alberto Simões, editors, 2nd Symposium on Languages, Applications and Technologies, volume 29 of OpenAccess Series in Informatics (OASIcs), pages 237-247, Dagstuhl, Germany, 2013. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: http://dx.doi.org/10.4230/OASIcs.SLATE.2013.237.
  29. Jörg Tiedemann. Parallel data, tools and interfaces in OPUS. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), pages 2214–-2218, Istanbul, 2012. ELRA. Google Scholar
  30. Piek Vossen, editor. EuroWordNet: A multilingual database with lexical semantic networks. Kluwer Academic Publishers, Norwell, 1998. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail