Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

Authors Jacob Steinhardt, Moses Charikar, Gregory Valiant



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2018.45.pdf
  • Filesize: 0.6 MB
  • 21 pages

Document Identifiers

Author Details

Jacob Steinhardt
Moses Charikar
Gregory Valiant

Cite AsGet BibTex

Jacob Steinhardt, Moses Charikar, and Gregory Valiant. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 45:1-45:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.ITCS.2018.45

Abstract

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings. We provide new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded kth moments. We also provide new algorithmic results on robust distribution learning, as well as robust mean estimation in p-norms. Among our proof techniques is a method for pruning a high-dimensional distribution with bounded 1st moments to a stable "core" with bounded 2nd moments, which may be of independent interest.
Keywords
  • robust learning
  • outliers
  • stochastic block models
  • p-norm estimation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. S. Balakrishnan, S. S. Du, J. Li, and A. Singh. Computationally efficient robust sparse estimation in high dimensions. In Conference on Learning Theory (COLT), pages 169-212, 2017. Google Scholar
  2. J. Batson, D. A. Spielman, and N. Srivastava. Twice-Ramanujan sparsifiers. SIAM Journal on Computing, 41(6):1704-1721, 2012. Google Scholar
  3. M. Charikar, J. Steinhardt, and G. Valiant. Learning from untrusted data. In STOC, 2017. Google Scholar
  4. A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6), 2011. Google Scholar
  5. I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In FOCS, 2016. Google Scholar
  6. I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart. Being robust (in high dimensions) can be practical. arXiv, 2017. Google Scholar
  7. I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robustly learning a Gaussian: Getting optimal error, efficiently. arXiv, 2017. Google Scholar
  8. I. Diakonikolas, D. Kane, and A. Stewart. Robust learning of fixed-structure Bayesian networks. arXiv, 2016. Google Scholar
  9. I. Diakonikolas, D. M. Kane, and A. Stewart. Statistical query lower bounds for robust estimation of high-dimensional Gaussians and Gaussian mixtures. arXiv, 2016. Google Scholar
  10. I. Diakonikolas, D. M. Kane, and A. Stewart. Learning geometric concepts with nasty noise. arXiv, 2017. Google Scholar
  11. U. Haagerup. The best constants in the khintchine inequality. Studia Mathematica, 70(3):231-283, 1981. Google Scholar
  12. D. Kane, S. Karmalkar, and E. Price. Robust polynomial regression up to the information theoretic limit. arXiv, 2017. Google Scholar
  13. A. Khintchine. Über dyadische brüche. Mathematische Zeitschrift, 18:109-116, 1923. Google Scholar
  14. A. R. Klivans, P. M. Long, and R. A. Servedio. Learning halfspaces with malicious noise. Journal of Machine Learning Research, 10:2715-2740, 2009. Google Scholar
  15. P. Kothari and J. Steinhardt. Better agnostic clustering via tensor norms. arXiv, 2017. Google Scholar
  16. K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In FOCS, 2016. Google Scholar
  17. J. Li. Robust sparse estimation tasks in high dimensions. arXiv, 2017. Google Scholar
  18. L. Massoulié. Community detection thresholds and the weak Ramanujan property. In STOC, pages 694-703, 2014. Google Scholar
  19. M. Meister and G. Valiant. A data prism: Semi-verified learning in the small-alpha regime. arXiv, 2017. Google Scholar
  20. E. Mossel, J. Neeman, and A. Sly. A proof of the block model threshold conjecture. arXiv, 2013. Google Scholar
  21. Y. Nesterov. Semidefinite relaxation and nonconvex quadratic optimization. Optimization methods and software, 9:141-160, 1998. Google Scholar
  22. S. Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, The Hebrew University of Jerusalem, 2007. Google Scholar
  23. J. Steinhardt. Does robustness imply tractability? A lower bound for planted clique in the semi-random model. arXiv, 2017. Google Scholar
  24. J. Steinhardt, G. Valiant, and M. Charikar. Avoiding imposters and delinquents: Adversarial crowdsourcing and peer prediction. In NIPS, 2016. Google Scholar
  25. H. Xu, C. Caramanis, and S. Mannor. Principal component analysis with contaminated data: The high dimensional case. arXiv, 2010. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail