Comparing and Benchmarking Semantic Measures Using SMComp

Authors Teresa Costa, José Paulo Leal



PDF
Thumbnail PDF

File

OASIcs.SLATE.2016.4.pdf
  • Filesize: 0.54 MB
  • 13 pages

Document Identifiers

Author Details

Teresa Costa
José Paulo Leal

Cite As Get BibTex

Teresa Costa and José Paulo Leal. Comparing and Benchmarking Semantic Measures Using SMComp. In 5th Symposium on Languages, Applications and Technologies (SLATE'16). Open Access Series in Informatics (OASIcs), Volume 51, pp. 4:1-4:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/OASIcs.SLATE.2016.4

Abstract

The goal of the semantic measures is to compare pairs of concepts, words, sentences or named entities. Their categorization depends on what they measure. If a measure only considers taxonomy relationships is a similarity measure; if it considers all type of relationships it is a relatedness measure.

The evaluation process of these measures usually relies on semantic gold standards. These datasets, with several pairs of words with a rating assigned by persons, are used to assess how well a semantic measure performs.

There are a few frameworks that provide tools to compute and analyze several well-known measures. This paper presents a novel tool - SMComp - a testbed designed for path-based semantic measures. At its current state, it is a domain-specific tool using three different versions of WordNet.

SMComp has two views: one to compute semantic measures of a pair of words and another to assess a semantic measure using a dataset. On the first view, it offers several measures described in the literature as well as the possibility of creating a new measure, by introducing Java code snippets on the GUI. The other view offers a large set of semantic benchmarks to use in the assessment process. It also offers the possibility of uploading a custom dataset to be used in the assessment.

Subject Classification

Keywords
  • Semantic similarity
  • semantic relatedness
  • testbed
  • web application

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Paşca, and Aitor Soroa. A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL'09, pages 19-27, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. URL: http://dl.acm.org/citation.cfm?id=1620754.1620758.
  2. Satanjeev Banerjee and Ted Pedersen. An adapted lesk algorithm for word sense disambiguation using wordnet. In Computational linguistics and intelligent text processing, pages 136-145. Springer, 2002. Google Scholar
  3. Satanjeev Banerjee and Ted Pedersen. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18^th International Joint Conference on Artificial Intelligence, IJCAI'03, pages 805-810, San Francisco, CA, USA, 2003. Morgan Kaufmann Publishers Inc. URL: http://dl.acm.org/citation.cfm?id=1630659.1630775.
  4. Olivier Bodenreider, Marc Aubry, and Anita Burgun. Non-lexical approaches to identifying associative relations in the gene ontology. In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pages 91-102. NIH Public Access, 2005. Google Scholar
  5. Elia Bruni, Nam Khanh Tran, and Marco Baroni. Multimodal distributional semantics. J. Artif. Int. Res., 49(1):1-47, January 2014. URL: http://dl.acm.org/citation.cfm?id=2655713.2655714.
  6. Teresa Costa and José Paulo Leal. Semantic measures: How similar? How related? in print, 2016. Google Scholar
  7. Christiane Fellbaum. WordNet. Wiley Online Library, 1999. Google Scholar
  8. Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. Placing search in context: The concept revisited. In Proceedings of the 10^th international conference on World Wide Web, pages 406-414. ACM, 2001. Google Scholar
  9. Yuriy Gorodnichenko and Gerard Roland. Understanding the individualism-collectivism cleavage and its effects: Lessons from cultural psychology. In Masahiko Aoki and Timur Kuran, editors, Institutions and Comparative Economic Development, volume 150, page 213. Palgrave Macmillan, 2012. Google Scholar
  10. Guy Halawi, Gideon Dror, Evgeniy Gabrilovich, and Yehuda Koren. Large-scale learning of word relatedness with constraints. In Proceedings of the 18^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'12, pages 1406-1414, New York, NY, USA, 2012. ACM. URL: http://dx.doi.org/10.1145/2339530.2339751.
  11. Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky Montmain. The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies. Bioinformatics, 30(5):740-742, 2014. Google Scholar
  12. Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky Montmain. Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies, 8(1):1-254, 2015. Google Scholar
  13. Felix Hill, Roi Reichart, and Anna Korhonen. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. arXiv preprint arXiv:1408.3456, 2014. Google Scholar
  14. Graeme Hirst and David St-Onge. Lexical chains as representations of context for the detection and correction of malapropisms. WordNet: An electronic lexical database, 305:305-332, 1998. Google Scholar
  15. Claudia Leacock and Martin Chodorow. Combining local context and wordnet similarity for word sense identification. WordNet: An electronic lexical database, 49(2):265-283, 1998. Google Scholar
  16. Dekang Lin. An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning, ICML'98, pages 296-304, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. URL: http://dl.acm.org/citation.cfm?id=645527.657297.
  17. George A. Miller and Walter G. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1-28, 1991. URL: http://dx.doi.org/10.1080/01690969108406936.
  18. Siddharth Patwardhan, Satanjeev Banerjee, and Ted Pedersen. Using measures of semantic relatedness for word sense disambiguation. In Proceedings of the 4^th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing'03, pages 241-257. Springer-Verlag, Berlin, Heidelberg, 2003. URL: http://dl.acm.org/citation.cfm?id=1791562.1791592.
  19. Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. Wordnet:: Similarity: measuring the relatedness of concepts. In Demonstration papers at HLT-NAACL 2004, pages 38-41. Association for Computational Linguistics, 2004. Google Scholar
  20. Giuseppe Pirró. A semantic similarity metric combining features and intrinsic information content. Data &Knowledge Engineering, 68(11):1289-1308, 2009. Google Scholar
  21. Giuseppe Pirró and Jérôme Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. In The Semantic Web-ISWC 2010, pages 615-630. Springer, 2010. Google Scholar
  22. Giuseppe Pirró and Nuno Seco. Design, implementation and evaluation of a new semantic similarity metric combining features and intrinsic information content. In On the Move to Meaningful Internet Systems: OTM 2008, pages 1271-1288. Springer, 2008. Google Scholar
  23. Roy Rada, Hafedh Mili, Ellen Bicknell, and Maria Blettner. Development and application of a metric on semantic nets. Systems, Man and Cybernetics, IEEE Transactions on, 19(1):17-30, 1989. Google Scholar
  24. Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, and Shaul Markovitch. A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of the 20^th International Conference on World Wide Web, WWW'11, pages 337-346, New York, NY, USA, 2011. ACM. URL: http://dx.doi.org/10.1145/1963405.1963455.
  25. Sylvie Ranwez, Vincent Ranwez, Jean Villerd, and Michel Crampes. Ontological distance measures for information visualisation on conceptual maps. In On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops, pages 1050-1061. Springer, 2006. Google Scholar
  26. Philip Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14^th International Joint Conference on Artificial Intelligence - Volume 1, IJCAI'95, pages 448-453, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc. URL: http://dl.acm.org/citation.cfm?id=1625855.1625914.
  27. Herbert Rubenstein and John B. Goodenough. Contextual correlates of synonymy. Commun. ACM, 8(10):627-633, 1965. Google Scholar
  28. Nenad Stojanovic, Alexander Maedche, Steffen Staab, Rudi Studer, and York Sure. Seal: A framework for developing semantic portals. In Proceedings of the 1^st International Conference on Knowledge Capture, K-CAP'01, pages 155-162, New York, NY, USA, 2001. ACM. URL: http://dx.doi.org/10.1145/500737.500762.
  29. Michael Strube and Simone Paolo Ponzetto. Wikirelate! Computing semantic relatedness using Wikipedia. In Proceedings of the 21^st National Conference on Artificial Intelligence - Volume 2, AAAI'06, pages 1419-1424. AAAI Press, 2006. URL: http://dl.acm.org/citation.cfm?id=1597348.1597414.
  30. Zhibiao Wu and Martha Palmer. Verbs semantics and lexical selection. In Proceedings of the 32^nd Annual Meeting on Association for Computational Linguistics, pages 133-138. Association for Computational Linguistics, 1994. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail