Early Findings in Using LLMs to Assess Semantic Relations Strength (Short Paper)

Authors André Fernandes dos Santos , José Paulo Leal



PDF
Thumbnail PDF

File

OASIcs.SLATE.2024.4.pdf
  • Filesize: 0.57 MB
  • 9 pages

Document Identifiers

Author Details

André Fernandes dos Santos
  • CRACS & INESC Tec LA / Faculty of Sciences, University of Porto, Portugal
José Paulo Leal
  • CRACS & INESC Tec LA / Faculty of Sciences, University of Porto, Portugal

Cite AsGet BibTex

André Fernandes dos Santos and José Paulo Leal. Early Findings in Using LLMs to Assess Semantic Relations Strength (Short Paper). In 13th Symposium on Languages, Applications and Technologies (SLATE 2024). Open Access Series in Informatics (OASIcs), Volume 120, pp. 4:1-4:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/OASIcs.SLATE.2024.4

Abstract

Semantic measure (SM) algorithms allow software to mimic the human ability of assessing the strength of the semantic relations between elements such as concepts, entities, words, or sentences. SM algorithms are typically evaluated by comparison against gold standard datasets built by human annotators. These datasets are composed of pairs of elements and an averaged numeric rating. Building such datasets usually requires asking human annotators to assign a numeric value to their perception of the strength of the semantic relation between two elements. Large language models (LLMs) have recently been successfully used to perform tasks which previously required human intervention, such as text summarization, essay writing, image description, image synthesis, question answering, and so on. In this paper, we present ongoing research on LLMs capabilities for semantic relations assessment. We queried several LLMs to rate the relationship of pairs of elements from existing semantic measures evaluation datasets, and measured the correlation between the results from the LLMs and gold standard datasets. Furthermore, we performed additional experiments to evaluate which other factors can influence LLMs performance in this task. We present and discuss the results obtained so far.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Language resources
Keywords
  • large language models
  • semantic measures
  • semantic datasets

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Anthropic. Anthropic Models overview. https://docs.anthropic.com/claude/docs/models-overview. Accessed: May 8, 2024.
  2. Anthropic. Tool use (function calling). https://docs.anthropic.com/claude/docs/tool-use. Accessed: May 9, 2024.
  3. Jordan Boyd-Graber, Christiane Fellbaum, Daniel Osherson, and Robert Schapire. Adding dense, weighted connections to wordnet. In Proceedings of the third international WordNet conference, pages 29-36. Citeseer, 2006. Google Scholar
  4. Elia Bruni, Nam-Khanh Tran, and Marco Baroni. Multimodal distributional semantics. Journal of artificial intelligence research, 49:1-47, 2014. URL: https://doi.org/10.1613/JAIR.4135.
  5. Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E Gonzalez, et al. Chatbot arena: An open platform for evaluating llms by human preference. arXiv preprint arXiv:2403.04132, 2024. Google Scholar
  6. Cohere. Models. https://docs.cohere.com/docs/models. Accessed: May 8, 2024.
  7. Liat Ein Dor, Alon Halfon, Yoav Kantor, Ran Levy, Yosi Mass, Ruty Rinott, Eyal Shnarch, and Noam Slonim. Semantic relatedness of wikipedia concepts-benchmark data and a working solution. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018. Google Scholar
  8. Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web, pages 406-414, 2001. URL: https://doi.org/10.1145/371920.372094.
  9. Google AI for Developers. Gemini models. https://ai.google.dev/gemini-api/docs/models/gemini. Accessed: May 8, 2024.
  10. Haleluya Hadero and David Bauder. New york times sues microsoft, open ai over use of content. Globe & Mail (Toronto, Canada), pages B1-B1, 2023. Google Scholar
  11. Muhammad Usman Hadi, Rizwan Qureshi, Abbas Shah, Muhammad Irfan, Anas Zafar, Muhammad Bilal Shaikh, Naveed Akhtar, Jia Wu, Seyedali Mirjalili, et al. Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints, 2023. Google Scholar
  12. Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky Montmain. Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies, 8(1):1-254, 2015. URL: https://doi.org/10.2200/S00639ED1V01Y201504HLT027.
  13. Brent Hecht, Samuel H Carton, Mahmood Quaderi, Johannes Schöning, Martin Raubal, Darren Gergle, and Doug Downey. Explanatory semantic relatedness and explicit spatialization for exploratory search. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pages 415-424, 2012. URL: https://doi.org/10.1145/2348283.2348341.
  14. Felix Hill, Roi Reichart, and Anna Korhonen. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 41(4):665-695, 2015. URL: https://doi.org/10.1162/COLI_A_00237.
  15. Eric H Huang, Richard Socher, Christopher D Manning, and Andrew Y Ng. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th annual meeting of the association for computational linguistics (Volume 1: Long papers), pages 873-882, 2012. URL: https://aclanthology.org/P12-1092/.
  16. Katikapalli Subramanyam Kalyan. A survey of gpt-3 family large language models including chatgpt and gpt-4. Natural Language Processing Journal, page 100048, 2023. Google Scholar
  17. Ran Levy, Liat Ein Dor, Shay Hummel, Ruty Rinott, and Noam Slonim. Tr9856: A multi-word term relatedness benchmark. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 419-424, 2015. URL: https://doi.org/10.3115/V1/P15-2069.
  18. Xuechen Li, Tianyi Zhang, Yann Dubois, Rohan Taori, Ishaan Gulrajani, Carlos Guestrin, Percy Liang, and Tatsunori B Hashimoto. Alpacaeval: An automatic evaluator of instruction-following models, 2023. Google Scholar
  19. Recalde Varela Pablo Marcel, Bolagay Egas Mauro Fernando, and Yanez Velasquez Jorge Roberto. A brief history of the artificial intelligence: chatgpt: The evolution of gpt. In 2023 18th Iberian Conference on Information Systems and Technologies (CISTI), pages 1-5. IEEE, 2023. Google Scholar
  20. George A Miller and Walter G Charles. Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1-28, 1991. Google Scholar
  21. Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, and Jianfeng Gao. Large language models: A survey. arXiv preprint arXiv:2402.06196, 2024. Google Scholar
  22. Mistral AI. Mistral AI Large Language Models. https://docs.mistral.ai/getting-started/models/. Accessed: May 8, 2024.
  23. Mistral AI Large Language Models. Function calling. https://docs.mistral.ai/capabilities/function_calling/. Accessed: May 9, 2024.
  24. Sonya Nikolova, Jordan Boyd-Graber, and Christiane Fellbaum. Collecting semantic similarity ratings to connect concepts in assistive communication tools. Modeling, Learning, and Processing of Text Technological Data Structures, pages 81-93, 2012. URL: https://doi.org/10.1007/978-3-642-22613-7_5.
  25. Linda Novobilská. Free and open source software licensing requirements and copyright infringement involving artificial intelligence technologies. Master’s thesis, Humboldt-Universität zu Berlin, 2023. Google Scholar
  26. OpenAI API. Function calling. https://platform.openai.com/docs/guides/function-calling. Accessed: May 9, 2024.
  27. OpenAI API. OpenAI Platform Documentation. https://platform.openai.com/docs/models/overview. Accessed: May 8, 2024.
  28. Ted Pedersen, Serguei VS Pakhomov, Siddharth Patwardhan, and Christopher G Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of biomedical informatics, 40(3):288-299, 2007. URL: https://doi.org/10.1016/J.JBI.2006.06.004.
  29. Giuseppe Pirró and Nuno Seco. Design, implementation and evaluation of a new semantic similarity metric combining features and intrinsic information content. In On the Move to Meaningful Internet Systems: OTM 2008: OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008, Monterrey, Mexico, November 9-14, 2008, Proceedings, Part II, pages 1271-1288. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-88873-4_25.
  30. Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, and Shaul Markovitch. A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on World wide web, pages 337-346, 2011. URL: https://doi.org/10.1145/1963405.1963455.
  31. Herbert Rubenstein and John B Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627-633, 1965. URL: https://doi.org/10.1145/365628.365657.
  32. Hansen A Schwartz and Fernando Gomez. Evaluating semantic metrics on tasks of concept similarity. In Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches, pages 324-340. IGI Global, 2012. Google Scholar
  33. Carina Silberer and Mirella Lapata. Learning grounded meaning representations with autoencoders. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 721-732, 2014. URL: https://doi.org/10.3115/V1/P14-1068.
  34. Tianyu Wu, Shizhu He, Jingping Liu, Siqi Sun, Kang Liu, Qing-Long Han, and Yang Tang. A brief overview of chatgpt: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica, 10(5):1122-1136, 2023. URL: https://doi.org/10.1109/JAS.2023.123618.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail