Benchmarking Regression Models Under Spatial Heterogeneity

Wiedemann, Nina; Martin, Henry; Westerholt, René

doi:10.4230/LIPIcs.GIScience.2023.11

File

Author Details

Nina Wiedemann

Institute of Cartography and Geoinformation, ETH Zürich, Switzerland

Henry Martin

Institute of Cartography and Geoinformation, ETH Zürich, Switzerland

René Westerholt

Department of Spatial Planning, TU Dortmund University, Germany

Cite AsGet BibTex

Nina Wiedemann, Henry Martin, and René Westerholt. Benchmarking Regression Models Under Spatial Heterogeneity. In 12th International Conference on Geographic Information Science (GIScience 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 277, pp. 11:1-11:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.GIScience.2023.11

Abstract

Machine learning methods have recently found much application on spatial data, for example in weather forecasting, traffic prediction, and soil analysis. At the same time, methods from spatial statistics were developed over the past decades to explicitly account for spatial structuring in analytical and inference tasks. In the light of this duality of having both types of methods available, we explore the following question: Under what circumstances are local, spatially-explicit models preferable over machine learning models that do not incorporate spatial structure explicitly in their specification? Local models are typically used to capture spatial non-stationarity. Thus, we study the effect of strength and type of spatial heterogeneity, which may originate from non-stationarity of a process itself or from heterogeneous noise, on the performance of different linear and non-linear, local and global machine learning and regression models. The results suggest that it is necessary to assess the performance of linear local models on an independent hold-out dataset, since models may overfit under certain conditions. We further show that local models are advantageous in settings with small sample size and high degrees of spatial heterogeneity. Our findings allow deriving model selection criteria, which are validated in benchmarking experiments on five well-known spatial datasets.

Subject Classification

ACM Subject Classification

Computing methodologies → Concurrent algorithms

Keywords

spatial machine learning
spatial non-stationarity
Geographically Weighted Regression
local models
geostatistics

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Zia U Ahmed, Kang Sun, Michael Shelly, and Lina Mu. Explainable artificial intelligence (XAI) for exploring spatial variability of lung and bronchus cancer (LBC) mortality rates in the contiguous USA. Scientific Reports, 11(1):1-15, 2021.
Colin M Beale, Jack J Lennon, Jon M Yearsley, Mark J Brewer, and David A Elston. Regression analysis of spatial data. Ecology letters, 13(2):246-264, 2010.
Chris Brunsdon, Stewart Fotheringham, and Martin Charlton. Geographically weighted regression. Journal of the Royal Statistical Society: Series D (The Statistician), 47(3):431-443, 1998.
Alexis Comber, Christopher Brunsdon, Martin Charlton, Guanpeng Dong, Richard Harris, Binbin Lu, Yihe Lü, Daisuke Murakami, Tomoki Nakaya, Yunqiang Wang, et al. A route map for successful applications of geographically weighted regression. Geographical Analysis, 55(1):155-178, 2023.
Matthew J Cracknell and Anya M Reading. Geological mapping using remote sensing data: a comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Computers & Geosciences, 63:22-33, 2014.
Sourav Das and Guy P Nason. Measuring the degree of non-stationarity of a time series. Stat, 5(1):295-305, 2016.
Zhenhong Du, Zhongyi Wang, Sensen Wu, Feng Zhang, and Renyi Liu. Geographically neural network weighted regression for the accurate estimation of spatial non-stationarity. International Journal of Geographical Information Science, 34(7):1353-1377, 2020.
Andrew O Finley. Comparing spatially-varying coefficients models for analysis of ecological data with non-stationary and anisotropic residual dependence. Methods in Ecology and Evolution, 2(2):143-154, 2011.
A Stewart Fotheringham, Chris Brunsdon, and Martin Charlton. Geographically weighted regression: the analysis of spatially varying relationships. John Wiley & Sons, Chichester, UK, 2003.
A Stewart Fotheringham, Wenbai Yang, and Wei Kang. Multiscale geographically weighted regression (MGWR). Annals of the American Association of Geographers, 107(6):1247-1265, 2017.
Stefanos Georganos, Tais Grippa, Assane Niang Gadiaga, Catherine Linard, Moritz Lennert, Sabine Vanhuysse, Nicholus Mboga, Eléonore Wolff, and Stamatis Kalogirou. Geographical Random Forests: a spatial extension of the Random Forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International, 36(2):121-136, 2021.
Daniel A Griffith and Yongwan Chun. Implementing Moran eigenvector spatial filtering for massively large georeferenced datasets. International Journal of Geographical Information Science, 33(9):1703-1717, 2019.
Julian Hagenauer and Marco Helbich. A geographically weighted artificial neural network. International Journal of Geographical Information Science, 36(2):215-235, 2022.
Tomislav Hengl, Madlene Nussbaum, Marvin N Wright, Gerard BM Heuvelink, and Benedikt Gräler. Random Forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ, 6:e5518, 2018.
Konstantin Klemmer. Improving neural networks for geospatial applications with geographic context embeddings. PhD thesis, University of Warwick, Coventry, UK, 2022.
James LeSage. Spatial econometrics. In Charlie Karlsson, Martin Andersson, and Therese Norman, editors, Handbook of research methods and applications in economic geography, pages 23-40. Edward Elgar Publishing, Cheltenham, UK, 2015.
James P LeSage. A family of geographically weighted regression models. In Luc Anselin, Raymond J. G. M. Florax, and Sergio J. Rey, editors, Advances in spatial econometrics, pages 241-264. Springer, Berlin/Heidelberg, Germany, 2004.
Jin Li, Andrew D Heap, Anna Potter, and James J Daniell. Application of machine learning methods to spatial interpolation of environmental variables. Environmental Modelling & Software, 26(12):1647-1659, 2011.
Xiaojian Liu, Ourania Kounadi, and Raul Zurita-Milla. Incorporating spatial autocorrelation in machine learning models using spatial lag and eigenvector spatial filtering features. ISPRS International Journal of Geo-Information, 11(4):242, 2022.
Benjamin S Murphy. PyKrige: development of a Kriging toolkit for Python. In American Geophysical Union Fall Meeting Abstracts, volume 2014, pages H51K-0753, San Francisco, CA, USA, 2014.
J Keith Ord and Arthur Getis. Local spatial heteroscedasticity (LOSH). The Annals of Regional Science, 48:529-539, 2012.
R Kelley Pace and Ronald Barry. Sparse spatial autoregressions. Statistics & Probability Letters, 33(3):291-297, 1997.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.
Martin Raubal. It’s the spatial data science, stupid! In Spatial Data Science Symposium "Setting the Spatial Data Science Agenda", Santa Barbara, CA, US, 2019. Center for Spatial Studies at the University of California.
MGJ Rikken and RPG Van Rijn. Soil pollution with heavy metals: in inquiry into spatial variation, cost of mapping and the risk evaluation of Copper, Cadmium, Lead and Zinc in the floodplains of the Meuse West of Stein, The Netherlands: field study report. University of Utrecht, 1993.
Sebastian Santibanez, Tobia Lakes, and Marius Kloft. Performance analysis of some machine learning algorithms for regression under varying spatial autocorrelation. In Proceedings of the 18th AGILE International Conference on Geographic Information Science, pages 9-12, Lisbon, Portugal, 2015.
Fabián Santos, Valerie Graw, and Santiago Bonilla. A geographically weighted Random Forest approach for evaluate forest change drivers in the Northern Ecuadorian Amazon. PloS one, 14(12):e0226224, 2019.
Aleksandar Sekulić, Milan Kilibarda, Gerard B.M. Heuvelink, Mladen Nikolić, and Branislav Bajat. Random Forest spatial interpolation. Remote Sensing, 12(10):1687, 2020.
René Westerholt. Emphasising spatial structure in geosocial media data using spatial amplifier filtering. Environment and Planning B: Urban Analytics and City Science, 48(9):2842-2861, 2021.
René Westerholt, Bernd Resch, Franz-Benjamin Mocnik, and Dirk Hoffmeister. A statistical test on the local effects of spatially structured variance. International Journal of Geographical Information Science, 32(3):571-600, 2018.
Ryan Zhenqi Zhou, Yingjie Hu, Jill N Tirabassi, Yue Ma, and Zhen Xu. Deriving neighborhood-level diet and physical activity measurements from anonymized mobile phone location data for enhancing obesity estimation. International Journal of Health Geographics, 21(1):1-18, 2022.

Benchmarking Regression Models Under Spatial Heterogeneity

Authors Nina Wiedemann , Henry Martin , René Westerholt

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Benchmarking Regression Models Under Spatial Heterogeneity

Authors Nina Wiedemann , Henry Martin , René Westerholt

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References