Large Language Models: Testing Their Capabilities to Understand and Explain Spatial Concepts (Short Paper)

Hojati, Majid; Feick, Rob

doi:10.4230/LIPIcs.COSIT.2024.31

File

Subject Classification

ACM Subject Classification

Information systems
Information systems → Geographic information systems

Keywords

Geospatial concepts
Large Language Models
LLM
GPT
Llama
Falcon

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

Interest in applying Large Language Models (LLMs), which use natural language processing (NLP) to provide human-like responses to text-based questions, to geospatial tasks has grown rapidly. Research shows that LLMs can help generate software code and answer some types of geographic questions to varying degrees even without fine-tuning. However, further research is required to explore the types of spatial questions they answer correctly, their abilities to apply spatial reasoning, and the variability between models. In this paper we examine the ability of four LLM models (GPT3.5 and 4, LLAma2.0, Falcon40B) to answer spatial questions that range from basic calculations to more advanced geographic concepts. The intent of this comparison is twofold. First, we demonstrate an extensible method for evaluating LLM’s limitations to supporting spatial data science through correct calculations and code generation. Relatedly, we also consider how these models can aid geospatial learning by providing text-based explanations of spatial concepts and operations. Our research shows common strengths in more basic types of questions, and mixed results for questions relating to more advanced spatial concepts. These results provide insights that may be used to inform strategies for testing and fine-tuning these models to increase their understanding of key spatial concepts.

Cite As Get BibTex

Majid Hojati and Rob Feick. Large Language Models: Testing Their Capabilities to Understand and Explain Spatial Concepts (Short Paper). In 16th International Conference on Spatial Information Theory (COSIT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 315, pp. 31:1-31:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024) https://doi.org/10.4230/LIPIcs.COSIT.2024.31

Author Details

Majid Hojati

Postdoctoral Fellow, School of Planning, University of Waterloo, Waterloo, Ontario, Canada

Rob Feick

Associate Professor, School of Planning, University of Waterloo, Waterloo, Ontario, Canada

References

Alaa Abd-alrazaq, Rawan AlSaad, Dari Alhuwail, Arfan Ahmed, Padraig Mark Healy, Syed Latifi, Sarah Aziz, Rafat Damseh, Sadam Alabed Alrazak, and Javaid Sheikh. Large language models in medical education: Opportunities, challenges, and future directions. JMIR Medical Education, 9:e48291, June 2023. URL: https://doi.org/10.2196/48291.
Mohamed Aghzal, Erion Plaku, and Ziyu Yao. Can large language models be good path planners? a benchmark and investigation on spatial-temporal reasoning. arXiv preprint arXiv:2310.03249, 2023.
Prabin Bhandari, Antonios Anastasopoulos, and Dieter Pfoser. Are large language models geospatially knowledgeable? In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems, SIGSPATIAL '23, New York, NY, USA, 2023. Association for Computing Machinery. URL: https://doi.org/10.1145/3589132.3625625.
Chew-Hung Chang and Gillian Kidman. The rise of generative artificial intelligence (ai) language models - challenges and opportunities for geographical and environmental education. International Research in Geographical and Environmental Education, 32(2):85-89, 2023. URL: https://doi.org/10.1080/10382046.2023.2194036.
Karl de Fine Licht. Integrating large language models into higher education: Guidelines for effective implementation. In IS4SI Summit 2023, IS4SI Summit 2023. MDPI, August 2023. URL: https://doi.org/10.3390/cmsf2023008065.
Ashwin Devaraj, William Sheffield, Byron Wallace, and Junyi Jessy Li. Evaluating factuality in text simplification. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7331-7345, Dublin, Ireland, May 2022. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2022.acl-long.506.
Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, and Rongrong Ji. Mme: A comprehensive evaluation benchmark for multimodal large language models, 2023. URL: https://arxiv.org/abs/2306.13394.
Nir Fulman, Abdulkadir Memduhoğlu, and Alexander Zipf. Distortions in judged spatial relations in large language models: The dawn of natural language geographic data?, 2024. URL: https://arxiv.org/abs/2401.04218.
Kai He, Rui Mao, Qika Lin, Yucheng Ruan, Xiang Lan, Mengling Feng, and Erik Cambria. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics, 2023. URL: https://arxiv.org/abs/2310.05694.
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions, 2023. URL: https://arxiv.org/abs/2311.05232.
Fangjun Li, David C. Hogg, and Anthony G. Cohn. Advancing spatial reasoning in large language models: An in-depth evaluation and enhancement using the stepgame benchmark, 2024. URL: https://arxiv.org/abs/2401.03991.
Zhenlong Li and Huan Ning. Autonomous gis: the next-generation ai-powered gis. International Journal of Digital Earth, 16(2):4668-4686, 2023.
Mengyi Liu, Xieyang Wang, Jianqiu Xu, and Hua Lu. Nalspatial: An effective natural language transformation framework for queries over spatial data. In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems, SIGSPATIAL '23, New York, NY, USA, 2023. Association for Computing Machinery. URL: https://doi.org/10.1145/3589132.3625600.
Ollie Liu, Deqing Fu, Dani Yogatama, and Willie Neiswanger. Dellma: A framework for decision making under uncertainty with large language models, 2024. URL: https://arxiv.org/abs/2402.02392.
Gengchen Mai, Chris Cundy, Kristy Choi, Yingjie Hu, Ni Lao, and Stefano Ermon. Towards a foundation model for geospatial artificial intelligence (vision paper). In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, SIGSPATIAL '22, New York, NY, USA, 2022. Association for Computing Machinery. URL: https://doi.org/10.1145/3557915.3561043.
Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. On faithfulness and factuality in abstractive summarization. In Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906-1919, Online, July 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.acl-main.173.
Peter Mooney, Wencong Cui, Boyuan Guan, and Levente Juhász. Towards understanding the geospatial skills of chatgpt: Taking a geographic information systems (gis) exam. In Proceedings of the 6th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, GeoAI 2023, Hamburg, Germany, 13 November 2023, November 2023. URL: https://doi.org/10.1145/3615886.3627745.
OpenAI. Openai chatgpt, 2024.
Guilherme Penedo, Quentin Malartic, Daniel Hesslow, Ruxandra Cojocaru, Alessandro Cappelli, Hamza Alobeidli, Baptiste Pannier, Ebtesam Almazrouei, and Julien Launay. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only, 2023. URL: https://arxiv.org/abs/2306.01116.
Simon Scheider, Enkhbold Nyamsuren, Han Kruiger, and Haiqi Xu. Geo-analytical question-answering with gis. International Journal of Digital Earth, 14(1):1-14, March 2020. URL: https://doi.org/10.1080/17538947.2020.1738568.
Liyan Tang, Zhaoyi Sun, Betina Idnay, Jordan G Nestor, Ali Soroush, Pierre A Elias, Ziyang Xu, Ying Ding, Greg Durrett, Justin F Rousseau, et al. Evaluating large language models on medical evidence summarization. npj Digital Medicine, 6(1):158, 2023.
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models, 2023. URL: https://arxiv.org/abs/2302.13971.
Rongxiang Weng, Heng Yu, Xiangpeng Wei, and Weihua Luo. Towards enhancing faithfulness for neural machine translation. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2675-2684, Online, November 2020. Association for Computational Linguistics. URL: https://doi.org/10.18653/v1/2020.emnlp-main.212.
Peng Xu, Wenqi Shao, Kaipeng Zhang, Peng Gao, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao, and Ping Luo. Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models, 2023. URL: https://arxiv.org/abs/2306.09265.
Yutaro Yamada, Yihan Bao, Andrew Kyle Lampinen, Jungo Kasai, and Ilker Yildirim. Evaluating spatial understanding of large language models. Transactions on Machine Learning Research, 2024. URL: https://openreview.net/forum?id=xkiflfKCw3.
Fuheng Zhao, Lawrence Lim, Ishtiyaque Ahmad, Divyakant Agrawal, and Amr El Abbadi. Llm-sql-solver: Can llms determine sql equivalence?, 2024. URL: https://arxiv.org/abs/2312.10321.

Large Language Models: Testing Their Capabilities to Understand and Explain Spatial Concepts (Short Paper)

Authors Majid Hojati , Rob Feick

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message