Finding Skewed Subcubes Under a Distribution

Authors Parikshit Gopalan, Roie Levin, Udi Wieder



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2020.84.pdf
  • Filesize: 0.64 MB
  • 30 pages

Document Identifiers

Author Details

Parikshit Gopalan
  • VMware Research, Palo Alto, CA, USA
Roie Levin
  • Carnegie Mellon University, Pittsburgh, PA, USA
Udi Wieder
  • VMware Research, Palo Alto, CA, USA

Cite AsGet BibTex

Parikshit Gopalan, Roie Levin, and Udi Wieder. Finding Skewed Subcubes Under a Distribution. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 151, pp. 84:1-84:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.ITCS.2020.84

Abstract

Say that we are given samples from a distribution ψ over an n-dimensional space. We expect or desire ψ to behave like a product distribution (or a k-wise independent distribution over its marginals for small k). We propose the problem of enumerating/list-decoding all large subcubes where the distribution ψ deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in ψ. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection. In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality. On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to ψ where finding skewed subcubes might be easier.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Probabilistic algorithms
Keywords
  • Fourier Analysis
  • Anomaly Detection
  • Algorithmic Fairness
  • Probability
  • Unsupervised Learning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Noga Alon, Alexandr Andoni, Tali Kaufman, Kevin Matulef, Ronitt Rubinfeld, and Ning Xie. Testing k-wise and almost k-wise independence. In STOC, 2007. Google Scholar
  2. Noga Alon and Joel H. Spencer. The Probabilistic Method. Wiley Publishing, 4th edition, 2016. Google Scholar
  3. Joy Buolamwini and Timnit Gebru. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Conference on Fairness, Accountability and Transparency, FAT 2018, 23-24 February 2018, New York, NY, USA, pages 77-91, 2018. URL: http://proceedings.mlr.press/v81/buolamwini18a.html.
  4. Ángel Cabrera, Will Epperson, Fred Hohman, Minsuk Kahng, Jamie Morgenstern, and Duen Horng Chau. FairVis: Visual Analytics for Discovering Intersectional Bias in Machine Learning. IEEE Conference on Visual Analytics Science and Technology (VAST), 2019. URL: https://poloclub.github.io/FairVis/.
  5. Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3):15:1-15:58, 2009. URL: https://doi.org/10.1145/1541880.1541882.
  6. Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 47-60, 2017. Google Scholar
  7. Andrew F Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, and Weng-Keen Wong. Systematic construction of anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD workshop on outlier detection and description, pages 16-21. ACM, 2013. Google Scholar
  8. Vitaly Feldman, Parikshit Gopalan, Subhash Khot, and Ashok Kumar Ponnuswami. On Agnostic Learning of Parities, Monomials, and Halfspaces. SIAM J. Comput., 39(2):606-645, 2009. URL: https://doi.org/10.1137/070684914.
  9. Oded Goldreich and Leonid A. Levin. A Hard-Core Predicate for all One-Way Functions. In Proceedings of the 21st Annual ACM Symposium on Theory of Computing, May 14-17, 1989, Seattle, Washigton, USA, pages 25-32, 1989. Google Scholar
  10. Parikshit Gopalan, Vatsal Sharan, and Udi Wieder. PIDForest: Anomaly detection via partial identification. In Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 2019. Google Scholar
  11. Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. Robust Random Cut Forest Based Anomaly Detection on Streams. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pages 2712-2721, 2016. Google Scholar
  12. Sushrut Karmalkar, Adam R. Klivans, and Pravesh K. Kothari. List-Decodable Linear Regression. CoRR, abs/1905.05679, 2019. URL: http://arxiv.org/abs/1905.05679.
  13. Matti Karppa, Petteri Kaski, and Jukka Kohonen. A Faster Subquadratic Algorithm for Finding Outlier Correlations. ACM Trans. Algorithms, 14(3):31:1-31:26, June 2018. URL: https://doi.org/10.1145/3174804.
  14. Eyal Kushilevitz and Yishay Mansour. Learning Decision Trees Using the Fourier Spectrum. SIAM J. Comput., 22(6):1331-1348, 1993. URL: https://doi.org/10.1137/0222080.
  15. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation Forest. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), December 15-19, 2008, Pisa, Italy, pages 413-422, 2008. Google Scholar
  16. Ryan O'Donnell. Analysis of boolean functions. Cambridge University Press, 2014. Google Scholar
  17. Ryan O'Donnell and Yu Zhao. On Closeness to k-Wise Uniformity. In APPROX-RANDOM, 2018. Google Scholar
  18. Gregory Valiant. Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem. J. ACM, 62(2):13:1-13:45, May 2015. URL: https://doi.org/10.1145/2728167.
  19. Sergey Yekhanin. Personal Communication, 2019. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail