Document

# Computing Approximate Statistical Discrepancy

## File

LIPIcs.ISAAC.2018.32.pdf
• Filesize: 0.86 MB
• 13 pages

## Cite As

Michael Matheny and Jeff M. Phillips. Computing Approximate Statistical Discrepancy. In 29th International Symposium on Algorithms and Computation (ISAAC 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 123, pp. 32:1-32:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ISAAC.2018.32

## Abstract

Consider a geometric range space (X,A) where X is comprised of the union of a red set R and blue set B. Let Phi(A) define the absolute difference between the fraction of red and fraction of blue points which fall in the range A. The maximum discrepancy range A^* = arg max_{A in (X,A)} Phi(A). Our goal is to find some A^ in (X,A) such that Phi(A^*) - Phi(A^) <= epsilon. We develop general algorithms for this approximation problem for range spaces with bounded VC-dimension, as well as significant improvements for specific geometric range spaces defined by balls, halfspaces, and axis-aligned rectangles. This problem has direct applications in discrepancy evaluation and classification, and we also show an improved reduction to a class of problems in spatial scan statistics.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Computational geometry
##### Keywords
• Scan Statistics
• Discrepancy
• Rectangles

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Deepak Agarwal, Andrew McGregor, Jeff M. Phillips, Suresh Venkatasubramanian, and Zhengyuan Zhu. Spatial Scan Statistics: Approximations and Performance Study. In KDD, 2006.
2. Deepak Agarwal, Jeff M. Phillips, and Suresh Venkatasubramanian. The Hunting of the Bump: On Maximizing Statistical Discrepancy. SODA, 2006.
3. Arturs Backurs, Nishanth Dikkala, and Christos Tzamos. Tight Hardness Results for Maximum Weight Rectangles. In ICALP, 2016. URL: http://arxiv.org/abs/1602.05837.
4. Jérémy Barbay, Timothy M. Chan, Gonzalo Navarro, and Pablo Pérez-Lantero. Maximum-weight planar boxes in time (and better). Information Processing Letters, 114(8):437-445, 2014.
5. Jon Bentley. Programming Pearls - Perspective on Performance. Communications of ACM, 27:1087-1092, 1984.
6. Bernard Chazelle. The Discrepancy Method. Cambridge, 2000.
7. David Dobkin and David Eppstein. Computing the Discrepancy. In Proceedings 9th Annual Symposium on Computational Geometry, 1993.
8. David P. Dobkin, David Eppstein, and Don P. Mitchell. Computing the Discrepancy with Applications to Supersampling Patterns. ACM Trans. Graph., 15(4):354-376, October 1996.
9. Takeshi Fukuda, Yasukiko Morimoto, Shinichi Morishita, and Takeshi Tokuyama. Data Mining Using Two-dimensional Optimized Association Rules: Scheme, Algorithms, and Visualization. SIGMOD Rec., 25(2):13-23, June 1996.
10. David Haussler. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension. J. Combinatorial Theory, A, 69:217-232, 1995.
11. Lan Huang, Martin Kulldorff, and David Gregorio. A Spatial Scan Statistic for Survival Data. BioMetrics, 63:109-118, 2007.
12. Martin Kulldorff. A Spatial Scan Statistic. Communications in Statistics: Theory and Methods, 26:1481-1496, 1997.
13. Martin Kulldorff. SatScan User Guide, 7.0 edition, 2006. URL: http://www.satscan.org/.
14. Martin Kulldorff, Lan Huang, Linda Pickle, and Luiz Duczmal. An elliptic spatial scan statistic. Statistics in medicine, 25 22:3929-43, 2006.
15. Yi Li, Philip M. Long, and Aravind Srinivasan. Improved Bounds on the Samples Complexity of Learning. J. Comp. and Sys. Sci., 62:516-527, 2001.
16. Ming C Lin and Dinesh Manocha. Applied Computational Geometry. Towards Geometric Engineering: Selected Papers, volume 114. Springer Science &Business Media, 1996.
17. Michael Matheny and Jeff M. Phillips. Computing Approximate Statistical Discrepancy. Technical report, arXiv, 2018. URL: http://arxiv.org/abs/1804.11287.
18. Michael Matheny and Jeff M. Phillips. Practical Low-Dimensional Halfspace Range Space Sampling. In European Symposium on Algorithms, 2018. URL: http://arxiv.org/abs/1804.11307.
19. Michael Matheny, Raghvendra Singh, Liang Zhang, Kaiqiang Wang, and Jeff M. Phillips. Scalable Spatial Scan Statistics Through Sampling. In SIGSPATIAL, 2016.
20. Jiri Matoušek. Geometric Discrepancy. Springer, 1999.
21. Jiri Matoušek. Lectures in Discrete Geometry. Springer, 2002.
22. Daniel B. Neill and Andrew W. Moore. Rapid Detection of Significant Spatial Clusters. In KDD, 2004.
23. Norbert Sauer. On the Density of Families of Sets. Journal of Combinatorial Theory, Series A, 13:145-147, 1972.
24. Tadao Takaoka. Efficient Algorithms for the Maximum Subarray Problem by Distance Matrix Multiplication. CATS, 2002.
25. Toshiro Tango and Kunihiko Takahashi. A flexibly shaped spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4(1):11, May 2005.
26. Vladimir Vapnik and Alexey Chervonenkis. On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Theo. of Prob and App, 16:264-280, 1971.
27. Guenther Walther. Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist., 38(2):1010-1033, April 2010.
28. Mingxi Wu, Xiuyao Song, Chris Jermaine, Sanjay Ranka, and John Gums. A LRT Framework for Fast Spatial Anomaly Detection. In KDD, 2009.
X

Feedback for Dagstuhl Publishing