Computing Approximate Statistical Discrepancy

Authors Michael Matheny, Jeff M. Phillips

Thumbnail PDF


  • Filesize: 0.86 MB
  • 13 pages

Document Identifiers

Author Details

Michael Matheny
  • University of Utah, Salt Lake City, USA
Jeff M. Phillips
  • University of Utah, Salt Lake City, USA

Cite AsGet BibTex

Michael Matheny and Jeff M. Phillips. Computing Approximate Statistical Discrepancy. In 29th International Symposium on Algorithms and Computation (ISAAC 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 123, pp. 32:1-32:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Consider a geometric range space (X,A) where X is comprised of the union of a red set R and blue set B. Let Phi(A) define the absolute difference between the fraction of red and fraction of blue points which fall in the range A. The maximum discrepancy range A^* = arg max_{A in (X,A)} Phi(A). Our goal is to find some A^ in (X,A) such that Phi(A^*) - Phi(A^) <= epsilon. We develop general algorithms for this approximation problem for range spaces with bounded VC-dimension, as well as significant improvements for specific geometric range spaces defined by balls, halfspaces, and axis-aligned rectangles. This problem has direct applications in discrepancy evaluation and classification, and we also show an improved reduction to a class of problems in spatial scan statistics.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
  • Scan Statistics
  • Discrepancy
  • Rectangles


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Deepak Agarwal, Andrew McGregor, Jeff M. Phillips, Suresh Venkatasubramanian, and Zhengyuan Zhu. Spatial Scan Statistics: Approximations and Performance Study. In KDD, 2006. Google Scholar
  2. Deepak Agarwal, Jeff M. Phillips, and Suresh Venkatasubramanian. The Hunting of the Bump: On Maximizing Statistical Discrepancy. SODA, 2006. Google Scholar
  3. Arturs Backurs, Nishanth Dikkala, and Christos Tzamos. Tight Hardness Results for Maximum Weight Rectangles. In ICALP, 2016. URL:
  4. Jérémy Barbay, Timothy M. Chan, Gonzalo Navarro, and Pablo Pérez-Lantero. Maximum-weight planar boxes in time (and better). Information Processing Letters, 114(8):437-445, 2014. Google Scholar
  5. Jon Bentley. Programming Pearls - Perspective on Performance. Communications of ACM, 27:1087-1092, 1984. Google Scholar
  6. Bernard Chazelle. The Discrepancy Method. Cambridge, 2000. Google Scholar
  7. David Dobkin and David Eppstein. Computing the Discrepancy. In Proceedings 9th Annual Symposium on Computational Geometry, 1993. Google Scholar
  8. David P. Dobkin, David Eppstein, and Don P. Mitchell. Computing the Discrepancy with Applications to Supersampling Patterns. ACM Trans. Graph., 15(4):354-376, October 1996. Google Scholar
  9. Takeshi Fukuda, Yasukiko Morimoto, Shinichi Morishita, and Takeshi Tokuyama. Data Mining Using Two-dimensional Optimized Association Rules: Scheme, Algorithms, and Visualization. SIGMOD Rec., 25(2):13-23, June 1996. Google Scholar
  10. David Haussler. Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension. J. Combinatorial Theory, A, 69:217-232, 1995. Google Scholar
  11. Lan Huang, Martin Kulldorff, and David Gregorio. A Spatial Scan Statistic for Survival Data. BioMetrics, 63:109-118, 2007. Google Scholar
  12. Martin Kulldorff. A Spatial Scan Statistic. Communications in Statistics: Theory and Methods, 26:1481-1496, 1997. Google Scholar
  13. Martin Kulldorff. SatScan User Guide, 7.0 edition, 2006. URL:
  14. Martin Kulldorff, Lan Huang, Linda Pickle, and Luiz Duczmal. An elliptic spatial scan statistic. Statistics in medicine, 25 22:3929-43, 2006. Google Scholar
  15. Yi Li, Philip M. Long, and Aravind Srinivasan. Improved Bounds on the Samples Complexity of Learning. J. Comp. and Sys. Sci., 62:516-527, 2001. Google Scholar
  16. Ming C Lin and Dinesh Manocha. Applied Computational Geometry. Towards Geometric Engineering: Selected Papers, volume 114. Springer Science &Business Media, 1996. Google Scholar
  17. Michael Matheny and Jeff M. Phillips. Computing Approximate Statistical Discrepancy. Technical report, arXiv, 2018. URL:
  18. Michael Matheny and Jeff M. Phillips. Practical Low-Dimensional Halfspace Range Space Sampling. In European Symposium on Algorithms, 2018. URL:
  19. Michael Matheny, Raghvendra Singh, Liang Zhang, Kaiqiang Wang, and Jeff M. Phillips. Scalable Spatial Scan Statistics Through Sampling. In SIGSPATIAL, 2016. Google Scholar
  20. Jiri Matoušek. Geometric Discrepancy. Springer, 1999. Google Scholar
  21. Jiri Matoušek. Lectures in Discrete Geometry. Springer, 2002. Google Scholar
  22. Daniel B. Neill and Andrew W. Moore. Rapid Detection of Significant Spatial Clusters. In KDD, 2004. Google Scholar
  23. Norbert Sauer. On the Density of Families of Sets. Journal of Combinatorial Theory, Series A, 13:145-147, 1972. Google Scholar
  24. Tadao Takaoka. Efficient Algorithms for the Maximum Subarray Problem by Distance Matrix Multiplication. CATS, 2002. Google Scholar
  25. Toshiro Tango and Kunihiko Takahashi. A flexibly shaped spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4(1):11, May 2005. Google Scholar
  26. Vladimir Vapnik and Alexey Chervonenkis. On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Theo. of Prob and App, 16:264-280, 1971. Google Scholar
  27. Guenther Walther. Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist., 38(2):1010-1033, April 2010. Google Scholar
  28. Mingxi Wu, Xiuyao Song, Chris Jermaine, Sanjay Ranka, and John Gums. A LRT Framework for Fast Spatial Anomaly Detection. In KDD, 2009. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail