Cross-Dictionary Linking at Sense Level with a Double-Layer Classifier

Authors Roser Saurí, Louis Mahon, Irene Russo, Mironas Bitinis



PDF
Thumbnail PDF

File

OASIcs.LDK.2019.20.pdf
  • Filesize: 0.58 MB
  • 16 pages

Document Identifiers

Author Details

Roser Saurí
  • Dictionaries Technology Group, Oxford University Press, UK
Louis Mahon
  • Dictionaries Technology Group, Oxford University Press, UK
  • Oxford University, UK
Irene Russo
  • Dictionaries Technology Group, Oxford University Press, UK
  • ILC A. Zampolli - CNR, Pisa, Italy
Mironas Bitinis
  • Dictionaries Technology Group, Oxford University Press, UK

Acknowledgements

We are very grateful to Charlotte Buxton and Rebecca Juganaru, the expert lexicographers who have contributed all the dictionary knowledge we were lacking and have helped with manual annotations. In addition, we want to express our thanks to Richard Shapiro, Will Hunter and Sophie Wood for their great support in different aspects of the project. All errors and mistakes are responsibility of the authors.

Cite AsGet BibTex

Roser Saurí, Louis Mahon, Irene Russo, and Mironas Bitinis. Cross-Dictionary Linking at Sense Level with a Double-Layer Classifier. In 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 20:1-20:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/OASIcs.LDK.2019.20

Abstract

We present a system for linking dictionaries at the sense level, which is part of a wider programme aiming to extend current lexical resources and to create new ones by automatic means. One of the main challenges of the sense linking task is the existence of non one-to-one mappings among senses. Our system handles this issue by addressing the task as a binary classification problem using standard Machine Learning methods, where each sense pair is classified independently from the others. In addition, it implements a second, statistically-based classification layer to also model the dependence existing among sense pairs, namely, the fact that a sense in one dictionary that is already linked to a sense in the other dictionary has a lower probability of being linked to a further sense. The resulting double-layer classifier achieves global Precision and Recall scores of 0.91 and 0.80, respectively.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Lexical semantics
  • Computing methodologies → Language resources
  • Computing methodologies → Supervised learning by classification
Keywords
  • Word sense linking
  • word sense mapping
  • lexical translation
  • lexical resources
  • language data construction
  • multilingual data
  • data integration across languages

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. M. Alper. Auto-generating Bilingual Dictionaries: Results of the TIAD-2017 Shared Task Baseline Algorithm. In Proceedings of the LDK 2017 Workshops, co-located with the 1st Conference on Language, Data and Knowledge, pages 85-93, 2017. Google Scholar
  2. R. J. Byrd. Discovering Relationships among Word Senses. In Antonio Zampolli, Nicoletta Calzolari, and Martha Palmer, editors, Current Issues in Computational Linguistics: In Honour of Don Walker, pages 177-189. Springer, Dordrecht, 1994. Google Scholar
  3. R. Caruana and A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In 23rd Int. Conference on Machine Learning, pages 161-168. ACM, 2006. Google Scholar
  4. A. Copestake, T. Briscoe, P. Vossen, A. Ageno, I. Castellón, F. Ribas, G. Rigau, H. Rodríguez, and A. Samiotou. Acquisition of lexical translation relations from MRDs. Machine Translation, 9:9-3, 1995. Google Scholar
  5. K. Donandt, C. Chiarcos, and M. Ionov. Using Machine Learning for Translation Inference Across Dictionaries. In Proceedings of the LDK 2017 Workshops, 2017. Google Scholar
  6. C. Fellbaum, editor. WordNet: an Electronic Lexical Database. MIT Press, 1998. Google Scholar
  7. Y. Freund and R. E. Schapire. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci., 55(1):119-139, August 1997. Google Scholar
  8. T. Gollins and M. Sanderson. Improving Cross Language Retrieval with Triangulated Translation. In 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '01, pages 90-95. ACM, 2001. Google Scholar
  9. I. Gurevych, J. Eckle-Kohler, S. Hartmann, M. Matuschek, C. M. Meyer, and C. Wirth. UBY - A large-scale unified lexical-semantic resource based on LMF. In Proceeding of the 13th EACL Conference, pages 580-590, 2012. Google Scholar
  10. I. Gurevych, J. Eckle-Kohler, and M. Matuschek. Linked Lexical Knowledge Bases: Foundations and Applications. Morgan &Claypool Publishers, 2016. Google Scholar
  11. N. M. Ide and J. Véronis. Mapping Dictionaries: A Spreading Activation Approach. In Proceedings for the New OED Conference, pages 52-64, 1990. Google Scholar
  12. H. Kaji, S. Tamamura, and D. Erdenebat. Automatic Construction of a Japanese-Chinese Dictionary via English. In LREC 2008, 2008. Google Scholar
  13. M. Lesk. Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. In 5th Annual International Conference on Systems Documentation, SIGDOC '86, pages 24-26, New York, NY, USA, 1986. ACM. Google Scholar
  14. G. Massó, P. Lambert, C. Rodríguez-Penagos, and R. Saurí. Generating New LIWC Dictionaries by Triangulation. In R. E. Banchs, F. Silvestri, T. Liu, M. Zhang, S. Gao, and J. Lang, editors, Information Retrieval Technology, pages 263-271, 2013. Google Scholar
  15. Mausam, S. Soderland, O. Etzioni, D. Weld, M. Skinner, and J. Bilmes. Compiling a Massive, Multilingual Dictionary via Probabilistic Inference. In Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 262-270. ACL, 2009. Google Scholar
  16. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, pages 3111-3119, 2013. Google Scholar
  17. R. Navigli and S. P. Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-coverage Multilingual Semantic Network. Artif. Intel., 193:217-250, December 2012. Google Scholar
  18. E. Niemann and I. Gurevych. The People’s Web Meets Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and Wordnet. In Ninth International Conference on Computational Semantics, IWCS '11, pages 205-214. ACL, 2011. Google Scholar
  19. S. P. Ponzetto and R. Navigli. Knowledge-rich Word Sense Disambiguation Rivaling Supervised Systems. In 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 1522-1531, 2010. Google Scholar
  20. T Proisl, P Heinrich, S Evert, and B Kabashi. Translation Inference across Dictionaries via a Combination of Graph-based Methods and Co-occurrence Stats. In LDK Workshops, 2017. Google Scholar
  21. M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatic Assignment of Wikipedia Encyclopedic Entries to Wordnet Synsets. In Third International Conference on Advances in Web Intelligence, AWIC'05, pages 380-386, 2005. Google Scholar
  22. K. Tanaka and K. Umemura. Construction of a Bilingual Dictionary Intermediated by a Third Language. In Proceedings of COLING'94, pages 297-303, 1994. Google Scholar
  23. I. Varga and S. Yokoyama. Bilingual dictionary generation for low-resourced language pairs. In Proceedings of EMNLP, pages 862-870, 2009. URL: http://www.aclweb.org/anthology/D09-1090.
  24. M. Villegas, M. Melero, N. Bel, and J. Gracia. Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceedings of LREC 2016, pages 23-28, 2016. Google Scholar
  25. Z. Wu and M. Palmer. Verbs Semantics and Lexical Selection. In 32nd Annual Meeting on Association for Computational Linguistics, ACL '94, pages 133-138, 1994. Google Scholar
  26. M. Wushouer, D. Lin, T. Ishida, and K. Hirayama. Pivot-Based Bilingual Dictionary Extraction from Multiple Dictionary Resources. In PRICAI 2014: Trends in Artificial Intelligence, pages 221-234, Cham, 2014. Springer International Publishing. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail