Approximate Maximum Halfspace Discrepancy

Authors Michael Matheny, Jeff M. Phillips



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2021.4.pdf
  • Filesize: 1.2 MB
  • 15 pages

Document Identifiers

Author Details

Michael Matheny
  • Amazon, Seattle, WA, USA
Jeff M. Phillips
  • University of Utah, Salt Lake City, UT, USA

Cite AsGet BibTex

Michael Matheny and Jeff M. Phillips. Approximate Maximum Halfspace Discrepancy. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 4:1-4:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ISAAC.2021.4

Abstract

Consider the geometric range space (X, H_d) where X ⊂ ℝ^d and H_d is the set of ranges defined by d-dimensional halfspaces. In this setting we consider that X is the disjoint union of a red and blue set. For each halfspace h ∈ H_d define a function Φ(h) that measures the "difference" between the fraction of red and fraction of blue points which fall in the range h. In this context the maximum discrepancy problem is to find the h^* = arg max_{h ∈ (X, H_d)} Φ(h). We aim to instead find an ĥ such that Φ(h^*) - Φ(ĥ) ≤ ε. This is the central problem in linear classification for machine learning, in spatial scan statistics for spatial anomaly detection, and shows up in many other areas. We provide a solution for this problem in O(|X| + (1/ε^d) log⁴ (1/ε)) time, for constant d, which improves polynomially over the previous best solutions. For d = 2 we show that this is nearly tight through conditional lower bounds. For different classes of Φ we can either provide a Ω(|X|^{3/2 - o(1)}) time lower bound for the exact solution with a reduction to APSP, or an Ω(|X| + 1/ε^{2-o(1)}) lower bound for the approximate solution with a reduction to 3Sum. A key technical result is a ε-approximate halfspace range counting data structure of size O(1/ε^d) with O(log (1/ε)) query time, which we can build in O(|X| + (1/ε^d) log⁴ (1/ε)) time.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational geometry
Keywords
  • range spaces
  • halfspaces
  • scan statistics
  • fine-grained complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Peyman Afshani and Timothy M. Chan. Optimal halfspace range reporting in three dimensions. In SODA, 2009. Google Scholar
  2. Deepak Agarwal, Andrew McGregor, Jeff M. Phillips, Suresh Venkatasubramanian, and Zhengyuan Zhu. Spatial scan statistics: Approximations and performance study. In KDD, 2006. Google Scholar
  3. Deepak Agarwal, Jeff M. Phillips, and Suresh Venkatasubramanian. The hunting of the bump: On maximizing statistical discrepancy. In SODA, 2006. Google Scholar
  4. Pankaj K. Agarwal. Simplex range searching. Journey Through Discrete Mathematics, pages 1-30, 2017. Google Scholar
  5. Boris Aronov and Sariel Har-Peled. On approximating the depth and related problems. SICOMP, 38:899-921, 2008. Google Scholar
  6. Arturs Backurs, Nishanth Dikkala, and Christos Tzamos. Tight hardness results for maximum weight rectangles. In ICALP, 2016. Google Scholar
  7. Jérémy Barbay, Timothy M. Chan, Gonzalo Navarro, and Pablo Pérez-Lantero. Maximum-weight planar boxes in time (and better). Information Processing Letters, 114(8):437-445, 2014. Google Scholar
  8. Mark De Berg and Otfried Schwarzkopf. Cuttings and applications. International Journal of Computational Geometry and Applications, 5:343-355, 1995. Google Scholar
  9. Bernard Chazelle. Geometric discrepancy revisited. In FOCS, 1993. Google Scholar
  10. Bernard Chazelle. The Discrepancy Method. Cambridge University Press, 2001. Google Scholar
  11. Sitan Chen, Frederic Koehler, Ankur Moitra, and Morris Yau. Classification under misspecification: Halfspaces, generalized linear models, and connections to evolvability. NeurIPS, 2020. URL: http://arxiv.org/abs/2006.04787.
  12. Ilias Diakonikolas, Themis Gouleakis, and Christos Tzamos. Distribution-independent pac learning of halfspaces with massart noise. arXiv preprint, 2019. URL: http://arxiv.org/abs/1906.10075.
  13. Ilias Diakonikolas, Daniel M Kane, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Agnostic proper learning of halfspaces under gaussian marginals. arXiv preprint, 2021. URL: http://arxiv.org/abs/2102.05629.
  14. Ilias Diakonikolas, Daniel M Kane, and Nikos Zarifis. Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals. arXiv preprint, 2020. URL: http://arxiv.org/abs/2006.16200.
  15. Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, and Nikos Zarifis. Learning halfspaces with tsybakov noise. arXiv preprint, 2020. URL: http://arxiv.org/abs/2006.06467.
  16. David Dobkin and David Eppstein. Computing the discrepancy. In Proceedings 9th Annual Symposium on Computational Geometry, 1993. Google Scholar
  17. David P. Dobkin, David Eppstein, and Don P. Mitchell. Computing the discrepancy with applications to supersampling patterns. ACM Transactions on Graphics, 15:354-376, 1996. Google Scholar
  18. Anka Gajentaan and Mark H. Overmars. On a class of o(n²) problems in computational geometry. Computational Geometry, 5:165-185, 1995. Google Scholar
  19. Sariel Har-Peled. Geometric Approximation Algorithms. AMS, 2011. Google Scholar
  20. David Haussler and Emo Welzl. epsilon-nets and simplex range queries. Discrete and Computational Geometry, 2:127-151, 1987. Google Scholar
  21. Lan Huang, Martin Kulldorff, and David Gregorio. A spatial scan statistic for survival data. BioMetrics, 63:109-118, 2007. Google Scholar
  22. Tsvi Kopelowitz, Seth Pettie, and Ely Porat. 3SUM hardness in (dynamic) data structures. Technical report, arXiv, 2014. URL: http://arxiv.org/abs/1407.6756.
  23. Martin Kulldorff. A spatial scan statistic. Communications in Statistics: Theory and Methods, 26:1481-1496, 1997. Google Scholar
  24. Martin Kulldorff. SatScan User Guide, 7.0 edition, 2006. URL: http://www.satscan.org/.
  25. Yi Li, Philip M. Long, and Aravind Srinivasan. Improved bounds on the samples complexity of learning. J. Comp. and Sys. Sci., 62:516-527, 2001. Google Scholar
  26. Ming C. Lin and Dinesh Manocha. Applied computational geometry. towards geometric engineering: Selected papers. Springer Science & Business Media, 114, 1996. Google Scholar
  27. Chi-Yuan Lo, Jirka Matousek, and William Steiger. Algorithms for ham-sandwich cuts. Discrete & Computational Geometry, 11:433-452, 1994. Google Scholar
  28. Michael Matheny and Jeff M. Phillips. Computing approximate statistical discrepancy. In International Symposium on Algorithm and Computation, 2018. Google Scholar
  29. Michael Matheny and Jeff M. Phillips. Practical low-dimensional halfspace range space sampling. In European Symposium on Algorithms, 2018. Google Scholar
  30. Michael Matheny, Raghvendra Singh, Liang Zhang, Kaiqiang Wang, and Jeff M. Phillips. Scalable spatial scan statistics through sampling. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2016. Google Scholar
  31. Jiri Matoušek. Approximations and optimal geometric divide-and-conquer. In Proceedings 23rd Symposium on Theory of Computing, pages 505-511, 1991. Google Scholar
  32. Jiri Matoušek. Geometric Discrepancy. Springer, 1999. Google Scholar
  33. Daniel B. Neill and Andrew W. Moore. Rapid detection of significant spatial clusters. In KDD, 2004. Google Scholar
  34. Tan Nguyen and Scott Sanner. Algorithms for direct 0-1 loss optimization in binary classification. In International Conference on Machine Learning, 2013. Google Scholar
  35. Saladi Rahul. Approximate range counting revisited. In SoCG, 2017. Google Scholar
  36. Lev Reyzin. Statistical queries and statistical algorithms: Foundations and applications. arXiv preprint, 2020. URL: http://arxiv.org/abs/2004.00557.
  37. Norbert Sauer. On the density of families of sets. Journal of Combinatorial Theory, Series A, 13:145-147, 1972. Google Scholar
  38. Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theo. of Prob and App, 16:264-280, 1971. Google Scholar
  39. Zhewei Wei and Ke Yi. Tight space bounds for two-dimensional approximate range counting. ACM Transactions on Algorithms (TALG), 14(2):1-17, 2018. Google Scholar
  40. Virginia Vassilevska Williams. Some open problems in fine-grained complexity. ACM SIGACTT News, 49:29-35, 2018. Google Scholar
  41. Virginia Vassilevska Williams and R. Ryan Williams. Subcubic equivalences between path, matrix, and triangle problems. Journal of ACM, 2018. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail