Benchmarking Regression Models Under Spatial Heterogeneity

Authors Nina Wiedemann , Henry Martin , René Westerholt



PDF
Thumbnail PDF

File

LIPIcs.GIScience.2023.11.pdf
  • Filesize: 1.86 MB
  • 15 pages

Document Identifiers

Author Details

Nina Wiedemann
  • Institute of Cartography and Geoinformation, ETH Zürich, Switzerland
Henry Martin
  • Institute of Cartography and Geoinformation, ETH Zürich, Switzerland
René Westerholt
  • Department of Spatial Planning, TU Dortmund University, Germany

Acknowledgements

We would like to thank Martin Raubal for the fruitful discussions about the project.

Cite AsGet BibTex

Nina Wiedemann, Henry Martin, and René Westerholt. Benchmarking Regression Models Under Spatial Heterogeneity. In 12th International Conference on Geographic Information Science (GIScience 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 277, pp. 11:1-11:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.GIScience.2023.11

Abstract

Machine learning methods have recently found much application on spatial data, for example in weather forecasting, traffic prediction, and soil analysis. At the same time, methods from spatial statistics were developed over the past decades to explicitly account for spatial structuring in analytical and inference tasks. In the light of this duality of having both types of methods available, we explore the following question: Under what circumstances are local, spatially-explicit models preferable over machine learning models that do not incorporate spatial structure explicitly in their specification? Local models are typically used to capture spatial non-stationarity. Thus, we study the effect of strength and type of spatial heterogeneity, which may originate from non-stationarity of a process itself or from heterogeneous noise, on the performance of different linear and non-linear, local and global machine learning and regression models. The results suggest that it is necessary to assess the performance of linear local models on an independent hold-out dataset, since models may overfit under certain conditions. We further show that local models are advantageous in settings with small sample size and high degrees of spatial heterogeneity. Our findings allow deriving model selection criteria, which are validated in benchmarking experiments on five well-known spatial datasets.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Concurrent algorithms
Keywords
  • spatial machine learning
  • spatial non-stationarity
  • Geographically Weighted Regression
  • local models
  • geostatistics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Zia U Ahmed, Kang Sun, Michael Shelly, and Lina Mu. Explainable artificial intelligence (XAI) for exploring spatial variability of lung and bronchus cancer (LBC) mortality rates in the contiguous USA. Scientific Reports, 11(1):1-15, 2021. Google Scholar
  2. Colin M Beale, Jack J Lennon, Jon M Yearsley, Mark J Brewer, and David A Elston. Regression analysis of spatial data. Ecology letters, 13(2):246-264, 2010. Google Scholar
  3. Chris Brunsdon, Stewart Fotheringham, and Martin Charlton. Geographically weighted regression. Journal of the Royal Statistical Society: Series D (The Statistician), 47(3):431-443, 1998. Google Scholar
  4. Alexis Comber, Christopher Brunsdon, Martin Charlton, Guanpeng Dong, Richard Harris, Binbin Lu, Yihe Lü, Daisuke Murakami, Tomoki Nakaya, Yunqiang Wang, et al. A route map for successful applications of geographically weighted regression. Geographical Analysis, 55(1):155-178, 2023. Google Scholar
  5. Matthew J Cracknell and Anya M Reading. Geological mapping using remote sensing data: a comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Computers & Geosciences, 63:22-33, 2014. Google Scholar
  6. Sourav Das and Guy P Nason. Measuring the degree of non-stationarity of a time series. Stat, 5(1):295-305, 2016. Google Scholar
  7. Zhenhong Du, Zhongyi Wang, Sensen Wu, Feng Zhang, and Renyi Liu. Geographically neural network weighted regression for the accurate estimation of spatial non-stationarity. International Journal of Geographical Information Science, 34(7):1353-1377, 2020. Google Scholar
  8. Andrew O Finley. Comparing spatially-varying coefficients models for analysis of ecological data with non-stationary and anisotropic residual dependence. Methods in Ecology and Evolution, 2(2):143-154, 2011. Google Scholar
  9. A Stewart Fotheringham, Chris Brunsdon, and Martin Charlton. Geographically weighted regression: the analysis of spatially varying relationships. John Wiley & Sons, Chichester, UK, 2003. Google Scholar
  10. A Stewart Fotheringham, Wenbai Yang, and Wei Kang. Multiscale geographically weighted regression (MGWR). Annals of the American Association of Geographers, 107(6):1247-1265, 2017. Google Scholar
  11. Stefanos Georganos, Tais Grippa, Assane Niang Gadiaga, Catherine Linard, Moritz Lennert, Sabine Vanhuysse, Nicholus Mboga, Eléonore Wolff, and Stamatis Kalogirou. Geographical Random Forests: a spatial extension of the Random Forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International, 36(2):121-136, 2021. Google Scholar
  12. Daniel A Griffith and Yongwan Chun. Implementing Moran eigenvector spatial filtering for massively large georeferenced datasets. International Journal of Geographical Information Science, 33(9):1703-1717, 2019. Google Scholar
  13. Julian Hagenauer and Marco Helbich. A geographically weighted artificial neural network. International Journal of Geographical Information Science, 36(2):215-235, 2022. Google Scholar
  14. Tomislav Hengl, Madlene Nussbaum, Marvin N Wright, Gerard BM Heuvelink, and Benedikt Gräler. Random Forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ, 6:e5518, 2018. Google Scholar
  15. Konstantin Klemmer. Improving neural networks for geospatial applications with geographic context embeddings. PhD thesis, University of Warwick, Coventry, UK, 2022. Google Scholar
  16. James LeSage. Spatial econometrics. In Charlie Karlsson, Martin Andersson, and Therese Norman, editors, Handbook of research methods and applications in economic geography, pages 23-40. Edward Elgar Publishing, Cheltenham, UK, 2015. Google Scholar
  17. James P LeSage. A family of geographically weighted regression models. In Luc Anselin, Raymond J. G. M. Florax, and Sergio J. Rey, editors, Advances in spatial econometrics, pages 241-264. Springer, Berlin/Heidelberg, Germany, 2004. Google Scholar
  18. Jin Li, Andrew D Heap, Anna Potter, and James J Daniell. Application of machine learning methods to spatial interpolation of environmental variables. Environmental Modelling & Software, 26(12):1647-1659, 2011. Google Scholar
  19. Xiaojian Liu, Ourania Kounadi, and Raul Zurita-Milla. Incorporating spatial autocorrelation in machine learning models using spatial lag and eigenvector spatial filtering features. ISPRS International Journal of Geo-Information, 11(4):242, 2022. Google Scholar
  20. Benjamin S Murphy. PyKrige: development of a Kriging toolkit for Python. In American Geophysical Union Fall Meeting Abstracts, volume 2014, pages H51K-0753, San Francisco, CA, USA, 2014. Google Scholar
  21. J Keith Ord and Arthur Getis. Local spatial heteroscedasticity (LOSH). The Annals of Regional Science, 48:529-539, 2012. Google Scholar
  22. R Kelley Pace and Ronald Barry. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291-297, 1997. Google Scholar
  23. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011. Google Scholar
  24. Martin Raubal. It’s the spatial data science, stupid! In Spatial Data Science Symposium "Setting the Spatial Data Science Agenda", Santa Barbara, CA, US, 2019. Center for Spatial Studies at the University of California. Google Scholar
  25. MGJ Rikken and RPG Van Rijn. Soil pollution with heavy metals: in inquiry into spatial variation, cost of mapping and the risk evaluation of Copper, Cadmium, Lead and Zinc in the floodplains of the Meuse West of Stein, The Netherlands: field study report. University of Utrecht, 1993. Google Scholar
  26. Sebastian Santibanez, Tobia Lakes, and Marius Kloft. Performance analysis of some machine learning algorithms for regression under varying spatial autocorrelation. In Proceedings of the 18th AGILE International Conference on Geographic Information Science, pages 9-12, Lisbon, Portugal, 2015. Google Scholar
  27. Fabián Santos, Valerie Graw, and Santiago Bonilla. A geographically weighted Random Forest approach for evaluate forest change drivers in the Northern Ecuadorian Amazon. PloS one, 14(12):e0226224, 2019. Google Scholar
  28. Aleksandar Sekulić, Milan Kilibarda, Gerard B.M. Heuvelink, Mladen Nikolić, and Branislav Bajat. Random Forest spatial interpolation. Remote Sensing, 12(10):1687, 2020. Google Scholar
  29. René Westerholt. Emphasising spatial structure in geosocial media data using spatial amplifier filtering. Environment and Planning B: Urban Analytics and City Science, 48(9):2842-2861, 2021. Google Scholar
  30. René Westerholt, Bernd Resch, Franz-Benjamin Mocnik, and Dirk Hoffmeister. A statistical test on the local effects of spatially structured variance. International Journal of Geographical Information Science, 32(3):571-600, 2018. Google Scholar
  31. Ryan Zhenqi Zhou, Yingjie Hu, Jill N Tirabassi, Yue Ma, and Zhen Xu. Deriving neighborhood-level diet and physical activity measurements from anonymized mobile phone location data for enhancing obesity estimation. International Journal of Health Geographics, 21(1):1-18, 2022. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail