Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

Steinhardt, Jacob; Charikar, Moses; Valiant, Gregory

doi:10.4230/LIPIcs.ITCS.2018.45

File

Subject Classification

Keywords

robust learning
outliers
stochastic block models
p-norm estimation

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings. We provide new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded kth moments. We also provide new algorithmic results on robust distribution learning, as well as robust mean estimation in p-norms. Among our proof techniques is a method for pruning a high-dimensional distribution with bounded 1st moments to a stable "core" with bounded 2nd moments, which may be of independent interest.

Cite As Get BibTex

Jacob Steinhardt, Moses Charikar, and Gregory Valiant. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 45:1-45:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.ITCS.2018.45

Author Details

Jacob Steinhardt

Moses Charikar

Gregory Valiant

References

S. Balakrishnan, S. S. Du, J. Li, and A. Singh. Computationally efficient robust sparse estimation in high dimensions. In Conference on Learning Theory (COLT), pages 169-212, 2017.
J. Batson, D. A. Spielman, and N. Srivastava. Twice-Ramanujan sparsifiers. SIAM Journal on Computing, 41(6):1704-1721, 2012.
M. Charikar, J. Steinhardt, and G. Valiant. Learning from untrusted data. In STOC, 2017.
A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6), 2011.
I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In FOCS, 2016.
I. Diakonikolas, G. Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart. Being robust (in high dimensions) can be practical. arXiv, 2017.
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robustly learning a Gaussian: Getting optimal error, efficiently. arXiv, 2017.
I. Diakonikolas, D. Kane, and A. Stewart. Robust learning of fixed-structure Bayesian networks. arXiv, 2016.
I. Diakonikolas, D. M. Kane, and A. Stewart. Statistical query lower bounds for robust estimation of high-dimensional Gaussians and Gaussian mixtures. arXiv, 2016.
I. Diakonikolas, D. M. Kane, and A. Stewart. Learning geometric concepts with nasty noise. arXiv, 2017.
U. Haagerup. The best constants in the khintchine inequality. Studia Mathematica, 70(3):231-283, 1981.
D. Kane, S. Karmalkar, and E. Price. Robust polynomial regression up to the information theoretic limit. arXiv, 2017.
A. Khintchine. Über dyadische brüche. Mathematische Zeitschrift, 18:109-116, 1923.
A. R. Klivans, P. M. Long, and R. A. Servedio. Learning halfspaces with malicious noise. Journal of Machine Learning Research, 10:2715-2740, 2009.
P. Kothari and J. Steinhardt. Better agnostic clustering via tensor norms. arXiv, 2017.
K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In FOCS, 2016.
J. Li. Robust sparse estimation tasks in high dimensions. arXiv, 2017.
L. Massoulié. Community detection thresholds and the weak Ramanujan property. In STOC, pages 694-703, 2014.
M. Meister and G. Valiant. A data prism: Semi-verified learning in the small-alpha regime. arXiv, 2017.
E. Mossel, J. Neeman, and A. Sly. A proof of the block model threshold conjecture. arXiv, 2013.
Y. Nesterov. Semidefinite relaxation and nonconvex quadratic optimization. Optimization methods and software, 9:141-160, 1998.
S. Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, The Hebrew University of Jerusalem, 2007.
J. Steinhardt. Does robustness imply tractability? A lower bound for planted clique in the semi-random model. arXiv, 2017.
J. Steinhardt, G. Valiant, and M. Charikar. Avoiding imposters and delinquents: Adversarial crowdsourcing and peer prediction. In NIPS, 2016.
H. Xu, C. Caramanis, and S. Mannor. Principal component analysis with contaminated data: The high dimensional case. arXiv, 2010.

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

Authors Jacob Steinhardt, Moses Charikar, Gregory Valiant

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message