Document Open Access Logo

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Authors Subhankar Ghosh, Jayant Gupta, Arun Sharma, Shuai An, Shashi Shekhar



PDF
Thumbnail PDF

File

LIPIcs.GIScience.2023.3.pdf
  • Filesize: 3.69 MB
  • 18 pages

Document Identifiers

Author Details

Subhankar Ghosh
  • Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, USA
Jayant Gupta
  • Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, USA
Arun Sharma
  • Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, USA
Shuai An
  • Department of Economics, University of Minnesota, Minneapolis, MN, USA
Shashi Shekhar
  • Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, USA

Acknowledgements

We also thank Kim Koffolt, Yash Travadi, and the Spatial Computing Research Group for valuable comments and refinements.

Cite AsGet BibTex

Subhankar Ghosh, Jayant Gupta, Arun Sharma, Shuai An, and Shashi Shekhar. Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results. In 12th International Conference on Geographic Information Science (GIScience 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 277, pp. 3:1-3:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.GIScience.2023.3

Abstract

Given a set S of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs <a region (r_{g}), a subset C of S> such that C is a statistically significant regional-colocation pattern in r_{g}. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner [Subhankar et. al, 2022] that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost.

Subject Classification

ACM Subject Classification
  • Information systems → Data mining
  • Computing methodologies → Spatial and physical reasoning
Keywords
  • Colocation pattern
  • Participation index
  • Multiple comparisons problem
  • Spatial heterogeneity
  • Statistical significance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sajib Barua and Jörg Sander. Mining statistically significant co-location and segregation patterns. IEEE TKDE, 26(5):1185-1199, 2013. Google Scholar
  2. Julian Besag and Peter J Diggle. Simple monte carlo tests for spatial pattern. Journal of the Royal Statistical Society: Series C (Applied Statistics), 26(3):327-333, 1977. Google Scholar
  3. Carlo Bonferroni. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8:3-62, 1936. Google Scholar
  4. M Celik et al. Zonal co-location pattern discovery with dynamic parameters. ICDM, 2007. Google Scholar
  5. Min Deng et al. Multi-level method for discovery of regional co-location patterns. IJGIS, 2017. Google Scholar
  6. Wouter Duivesteijn and Arno Knobbe. Exploiting false discoveries-statistical validation of patterns and quality measures in subgroup discovery. In 2011 IEEE 11th International Conference on Data Mining, pages 151-160. IEEE, 2011. Google Scholar
  7. Christoph F. Eick, Rachana Parmar, et al. Finding regional co-location patterns for sets of continuous variables in spatial datasets. In SIGSPATIAL, 2008. Google Scholar
  8. Subhankar et. al. Towards geographically robust statistically significant regional colocation pattern detection. In Proceedings of the 5th ACM SIGSPATIAL GeoSIM, pages 11-20, 2022. Google Scholar
  9. Yan Li et al. Cscd: Towards spatially resolving the heterogeneous landscape of mxif oncology data. In Proceedings of the 10th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial '22, pages 36-46, New York, NY, USA, 2022. ACM. Google Scholar
  10. Yan Huang et al. Discovering colocation patterns from spatial data sets: a general approach. IEEE TKDE, 16(12):1472-1485, 2004. Google Scholar
  11. Janine Illian, Antti Penttinen, et al. Statistical analysis and modelling of spatial point patterns, volume 70. John Wiley & Sons, 2008. Google Scholar
  12. Yan Li and Shashi Shekhar. Local co-location pattern detection: a summary of results. In GIScience. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. Google Scholar
  13. Guenter B Risse. "A long pull, a strong pull, and all together": San francisco and bubonic plague, 1907-1908. Bulletin of the History of Medicine, 66(2):260-286, 1992. Google Scholar
  14. G Rupert Jr et al. Simultaneous statistical inference. Springer Science & Business Media, 2012. Google Scholar
  15. Arun Sharma, Jayant Gupta, and Subhankar Ghosh. Towards a tighter bound on possible-rendezvous areas: preliminary results. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, pages 1-11, 2022. Google Scholar
  16. Shashi Shekhar and Yan Huang. Discovering spatial co-location patterns: A summary of results. In Intl. symposium on spatial and temporal databases, pages 236-256. Springer, 2001. Google Scholar
  17. Song Wang et al. Regional co-locations of arbitrary shapes. In SSTD, 2013. Google Scholar
  18. Geoffrey I Webb. Discovering significant patterns. Machine learning, 68(1):1-33, 2007. Google Scholar
  19. David WS Wong. The modifiable areal unit problem (maup). In WorldMinds: geographical perspectives on 100 problems: commemorating the 100th anniversary of the association of American geographers 1904-2004, pages 571-575. Springer, 2004. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail