Towards Statistically Significant Taxonomy Aware Co-Location Pattern Detection (Short Paper)

Authors Subhankar Ghosh , Arun Sharma , Jayant Gupta , Shashi Shekhar



PDF
Thumbnail PDF

File

LIPIcs.COSIT.2024.25.pdf
  • Filesize: 0.71 MB
  • 11 pages

Document Identifiers

Author Details

Subhankar Ghosh
  • Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
Arun Sharma
  • Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
Jayant Gupta
  • Oracle Inc., Nashua, NH, USA
Shashi Shekhar
  • Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA

Cite AsGet BibTex

Subhankar Ghosh, Arun Sharma, Jayant Gupta, and Shashi Shekhar. Towards Statistically Significant Taxonomy Aware Co-Location Pattern Detection (Short Paper). In 16th International Conference on Spatial Information Theory (COSIT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 315, pp. 25:1-25:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.COSIT.2024.25

Abstract

Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across the food chain), spatial pathology (e.g., immunotherapy for cancer), retail, etc. The problem is computationally challenging due to the exponential number of candidate co-location patterns generated by the taxonomy. Most approaches for co-location pattern detection overlook the hierarchical relationships among spatial features, and the statistical significance of the detected patterns is not always considered, leading to potential false discoveries. This paper introduces two methods for incorporating taxonomies and assessing the statistical significance of co-location patterns. The baseline approach iteratively checks the significance of co-locations between leaf nodes or their ancestors in the taxonomy. Using the Benjamini-Hochberg procedure, an advanced approach is proposed to control the false discovery rate. This approach effectively reduces the risk of false discoveries while maintaining the power to detect true co-location patterns. Experimental evaluation and case study results show the effectiveness of the approach.

Subject Classification

ACM Subject Classification
  • Information systems → Data mining
  • Computing methodologies → Spatial and physical reasoning
Keywords
  • Co-location patterns
  • spatial data mining
  • taxonomy
  • hierarchy
  • statistical significance
  • false discovery rate
  • family-wise error rate

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pages 207-216, 1993. Google Scholar
  2. Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487-499. Citeseer, 1994. Google Scholar
  3. Jared Aldstadt. Spatial clustering. In Handbook of applied spatial analysis: Software tools, methods and applications, pages 279-300. Springer, 2009. Google Scholar
  4. Sajib Barua and Jörg Sander. Mining statistically significant co-location and segregation patterns. IEEE TKDE, 26(5):1185-1199, 2013. Google Scholar
  5. Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289-300, 1995. Google Scholar
  6. Julian Besag and Peter J Diggle. Simple monte carlo tests for spatial pattern. Journal of the Royal Statistical Society: Series C (Applied Statistics), 26(3):327-333, 1977. Google Scholar
  7. Majid Farhadloo, Carl Molnar, Gaoxiang Luo, Yan Li, Shashi Shekhar, Rachel L Maus, Svetomir Markovic, Alexey Leontovich, and Raymond Moore. Samcnet: towards a spatially explainable ai approach for classifying mxif oncology data. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pages 2860-2870, 2022. Google Scholar
  8. Majid Farhadloo, Arun Sharma, Jayant Gupta, Alexey Leontovich, Svetomir N Markovic, and Shashi Shekhar. Towards spatially-lucid ai classification in non-euclidean space: An application for mxif oncology data. In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM), pages 616-624. SIAM, 2024. Google Scholar
  9. Majid Farhadloo, Arun Sharma, Shashi Shekhar, and Svetomir N Markovic. Spatial computing opportunities in biomedical decision support: The atlas-ehr vision. arXiv preprint arXiv:2305.09675, 2023. Google Scholar
  10. Subhankar Ghosh. Video popularity distribution and propagation in social networks. Int. J. Emerg. Trends Technol. Comput. Sci.(IJETTCS), 2017. Google Scholar
  11. Subhankar Ghosh, Shuai An, Arun Sharma, Jayant Gupta, Shashi Shekhar, and Aneesh Subramanian. Reducing uncertainty in sea-level rise prediction: A spatial-variability-aware approach. arXiv preprint arXiv:2310.15179, 2023. Google Scholar
  12. Subhankar Ghosh et al. Towards geographically robust statistically significant regional colocation pattern detection. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on GeoSpatial Simulation, pages 11-20, 2022. Google Scholar
  13. Subhankar Ghosh et al. Reducing false discoveries in statistically-significant regional-colocation mining: A summary of results. In 12th International Conference on Geographic Information Science (GIScience 2023). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2023. Google Scholar
  14. Jayant Gupta and Arun Sharma. Mining taxonomy-aware colocations: a summary of results. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, pages 1-11, 2022. Google Scholar
  15. Yan Huang et al. Discovering colocation patterns from spatial data sets: a general approach. IEEE TKDE, 16(12):1472-1485, 2004. Google Scholar
  16. Yan Li, Majid Farhadloo, Santhoshi Krishnan, Yiqun Xie, Timothy L Frankel, Shashi Shekhar, and Arvind Rao. Cscd: towards spatially resolving the heterogeneous landscape of mxif oncology data. In Proceedings of the 10th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 36-46, 2022. Google Scholar
  17. Yan Li and Shashi Shekhar. Local co-location pattern detection: a summary of results. In GIScience. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. Google Scholar
  18. G Rupert Jr et al. Simultaneous statistical inference. Springer Science & Business Media, 2012. Google Scholar
  19. Arun Sharma, Subhankar Ghosh, and Shashi Shekhar. Physics-based abnormal trajectory gap detection. ACM Transactions on Intelligent Systems and Technology, 2024. Google Scholar
  20. Arun Sharma, Jayant Gupta, and Subhankar Ghosh. Towards a tighter bound on possible-rendezvous areas: preliminary results. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, pages 1-11, 2022. Google Scholar
  21. Shashi Shekhar and Yan Huang. Discovering spatial co-location patterns: A summary of results. In Intl. symposium on spatial and temporal databases, pages 236-256. Springer, 2001. Google Scholar
  22. R. Whittaker. Evolution and measurement of species diversity. Taxon, 21(2-3):213-251, 1972. Google Scholar
  23. Xiangye Xiao et al. Density based co-location pattern discovery. In Proceedings of the 16th International conference on Advances in geographic information systems, pages 1-10, 2008. Google Scholar
  24. Mingzhou Yang, Bharat Jayaprakash, Matthew Eagon, Hyeonjung Jung, William F Northrop, and Shashi Shekhar. Data mining challenges and opportunities to achieve net zero carbon emissions: Focus on electrified vehicles. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pages 953-956. SIAM, 2023. Google Scholar
  25. Jin Soung Yoo and Shashi Shekhar. A joinless approach for mining spatial colocation patterns. IEEE Transactions on Knowledge and Data Engineering, 18(10):1323-1337, 2006. Google Scholar
  26. Jing Yuan, Yu Zheng, and Xing Xie. Discovering regions of different functions in a city using human mobility and pois. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 186-194, 2012. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail