Document

# Approximate Maximum Halfspace Discrepancy

## File

LIPIcs.ISAAC.2021.4.pdf
• Filesize: 1.2 MB
• 15 pages

## Cite As

Michael Matheny and Jeff M. Phillips. Approximate Maximum Halfspace Discrepancy. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ISAAC.2021.4

## Abstract

Consider the geometric range space (X, H_d) where X ⊂ ℝ^d and H_d is the set of ranges defined by d-dimensional halfspaces. In this setting we consider that X is the disjoint union of a red and blue set. For each halfspace h ∈ H_d define a function Φ(h) that measures the "difference" between the fraction of red and fraction of blue points which fall in the range h. In this context the maximum discrepancy problem is to find the h^* = arg max_{h ∈ (X, H_d)} Φ(h). We aim to instead find an ĥ such that Φ(h^*) - Φ(ĥ) ≤ ε. This is the central problem in linear classification for machine learning, in spatial scan statistics for spatial anomaly detection, and shows up in many other areas. We provide a solution for this problem in O(|X| + (1/ε^d) log⁴ (1/ε)) time, for constant d, which improves polynomially over the previous best solutions. For d = 2 we show that this is nearly tight through conditional lower bounds. For different classes of Φ we can either provide a Ω(|X|^{3/2 - o(1)}) time lower bound for the exact solution with a reduction to APSP, or an Ω(|X| + 1/ε^{2-o(1)}) lower bound for the approximate solution with a reduction to 3Sum. A key technical result is a ε-approximate halfspace range counting data structure of size O(1/ε^d) with O(log (1/ε)) query time, which we can build in O(|X| + (1/ε^d) log⁴ (1/ε)) time.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Computational geometry
##### Keywords
• range spaces
• halfspaces
• scan statistics
• fine-grained complexity

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Peyman Afshani and Timothy M. Chan. Optimal halfspace range reporting in three dimensions. In SODA, 2009.
2. Deepak Agarwal, Andrew McGregor, Jeff M. Phillips, Suresh Venkatasubramanian, and Zhengyuan Zhu. Spatial scan statistics: Approximations and performance study. In KDD, 2006.
3. Deepak Agarwal, Jeff M. Phillips, and Suresh Venkatasubramanian. The hunting of the bump: On maximizing statistical discrepancy. In SODA, 2006.
4. Pankaj K. Agarwal. Simplex range searching. Journey Through Discrete Mathematics, pages 1-30, 2017.
5. Boris Aronov and Sariel Har-Peled. On approximating the depth and related problems. SICOMP, 38:899-921, 2008.
6. Arturs Backurs, Nishanth Dikkala, and Christos Tzamos. Tight hardness results for maximum weight rectangles. In ICALP, 2016.
7. Jérémy Barbay, Timothy M. Chan, Gonzalo Navarro, and Pablo Pérez-Lantero. Maximum-weight planar boxes in time (and better). Information Processing Letters, 114(8):437-445, 2014.
8. Mark De Berg and Otfried Schwarzkopf. Cuttings and applications. International Journal of Computational Geometry and Applications, 5:343-355, 1995.
9. Bernard Chazelle. Geometric discrepancy revisited. In FOCS, 1993.
10. Bernard Chazelle. The Discrepancy Method. Cambridge University Press, 2001.
11. Sitan Chen, Frederic Koehler, Ankur Moitra, and Morris Yau. Classification under misspecification: Halfspaces, generalized linear models, and connections to evolvability. NeurIPS, 2020. URL: http://arxiv.org/abs/2006.04787.
12. Ilias Diakonikolas, Themis Gouleakis, and Christos Tzamos. Distribution-independent pac learning of halfspaces with massart noise. arXiv preprint, 2019. URL: http://arxiv.org/abs/1906.10075.
13. Ilias Diakonikolas, Daniel M Kane, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Agnostic proper learning of halfspaces under gaussian marginals. arXiv preprint, 2021. URL: http://arxiv.org/abs/2102.05629.
14. Ilias Diakonikolas, Daniel M Kane, and Nikos Zarifis. Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals. arXiv preprint, 2020. URL: http://arxiv.org/abs/2006.16200.
15. Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Learning halfspaces with tsybakov noise. arXiv preprint, 2020. URL: http://arxiv.org/abs/2006.06467.
16. David Dobkin and David Eppstein. Computing the discrepancy. In Proceedings 9th Annual Symposium on Computational Geometry, 1993.
17. David P. Dobkin, David Eppstein, and Don P. Mitchell. Computing the discrepancy with applications to supersampling patterns. ACM Transactions on Graphics, 15:354-376, 1996.
18. Anka Gajentaan and Mark H. Overmars. On a class of o(n²) problems in computational geometry. Computational Geometry, 5:165-185, 1995.
19. Sariel Har-Peled. Geometric Approximation Algorithms. AMS, 2011.
20. David Haussler and Emo Welzl. epsilon-nets and simplex range queries. Discrete and Computational Geometry, 2:127-151, 1987.
21. Lan Huang, Martin Kulldorff, and David Gregorio. A spatial scan statistic for survival data. BioMetrics, 63:109-118, 2007.
22. Tsvi Kopelowitz, Seth Pettie, and Ely Porat. 3SUM hardness in (dynamic) data structures. Technical report, arXiv, 2014. URL: http://arxiv.org/abs/1407.6756.
23. Martin Kulldorff. A spatial scan statistic. Communications in Statistics: Theory and Methods, 26:1481-1496, 1997.
24. Martin Kulldorff. SatScan User Guide, 7.0 edition, 2006. URL: http://www.satscan.org/.
25. Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the samples complexity of learning. J. Comp. and Sys. Sci., 62:516-527, 2001.
26. Ming C. Lin and Dinesh Manocha. Applied computational geometry. towards geometric engineering: Selected papers. Springer Science & Business Media, 114, 1996.
27. Chi-Yuan Lo, Jirka Matousek, and William Steiger. Algorithms for ham-sandwich cuts. Discrete & Computational Geometry, 11:433-452, 1994.
28. Michael Matheny and Jeff M. Phillips. Computing approximate statistical discrepancy. In International Symposium on Algorithm and Computation, 2018.
29. Michael Matheny and Jeff M. Phillips. Practical low-dimensional halfspace range space sampling. In European Symposium on Algorithms, 2018.
30. Michael Matheny, Raghvendra Singh, Liang Zhang, Kaiqiang Wang, and Jeff M. Phillips. Scalable spatial scan statistics through sampling. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2016.
31. Jiri Matoušek. Approximations and optimal geometric divide-and-conquer. In Proceedings 23rd Symposium on Theory of Computing, pages 505-511, 1991.
32. Jiri Matoušek. Geometric Discrepancy. Springer, 1999.
33. Daniel B. Neill and Andrew W. Moore. Rapid detection of significant spatial clusters. In KDD, 2004.
34. Tan Nguyen and Scott Sanner. Algorithms for direct 0-1 loss optimization in binary classification. In International Conference on Machine Learning, 2013.
35. Saladi Rahul. Approximate range counting revisited. In SoCG, 2017.
36. Lev Reyzin. Statistical queries and statistical algorithms: Foundations and applications. arXiv preprint, 2020. URL: http://arxiv.org/abs/2004.00557.
37. Norbert Sauer. On the density of families of sets. Journal of Combinatorial Theory, Series A, 13:145-147, 1972.
38. Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theo. of Prob and App, 16:264-280, 1971.
39. Zhewei Wei and Ke Yi. Tight space bounds for two-dimensional approximate range counting. ACM Transactions on Algorithms (TALG), 14(2):1-17, 2018.
40. Virginia Vassilevska Williams. Some open problems in fine-grained complexity. ACM SIGACTT News, 49:29-35, 2018.
41. Virginia Vassilevska Williams and R. Ryan Williams. Subcubic equivalences between path, matrix, and triangle problems. Journal of ACM, 2018.
X

Feedback for Dagstuhl Publishing