Low Rank Approximation in the Presence of Outliers

Bhaskara, Aditya; Kumar, Srivatsan

doi:10.4230/LIPIcs.APPROX-RANDOM.2018.4

Abstract

We consider the problem of principal component analysis (PCA) in the presence of outliers. Given a matrix A (d x n) and parameters k, m, the goal is to remove a set of at most m columns of A (outliers), so as to minimize the rank-k approximation error of the remaining matrix (inliers). While much of the work on this problem has focused on recovery of the rank-k subspace under assumptions on the inliers and outliers, we focus on the approximation problem. Our main result shows that sampling-based methods developed in the outlier-free case give non-trivial guarantees even in the presence of outliers. Using this insight, we develop a simple algorithm that has bi-criteria guarantees. Further, unlike similar formulations for clustering, we show that bi-criteria guarantees are unavoidable for the problem, under appropriate complexity assumptions.

Maria-Florina Balcan, Avrim Blum, and Santosh Vempala. A discriminative framework for clustering via similarity functions. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC '08, pages 671-680, New York, NY, USA, 2008. ACM. URL: http://dx.doi.org/10.1145/1374376.1374474.
Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. Detecting high log-densities: an n^1/4 approximation for densest k-subgraph. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, pages 201-210, 2010. URL: http://dx.doi.org/10.1145/1806689.1806718.
Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami, Aravindan Vijayaraghavan, and Yuan Zhou. Polynomial integrality gaps for strong sdp relaxations of densest k-subgraph. In ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, 2012. URL: http://dx.doi.org/10.1145/1806689.1806718.
S. Charles Brubaker. Robust PCA and clustering in noisy mixtures. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 1078-1087, 2009.
Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? J. ACM, 58(3):11:1-11:37, jun 2011. URL: http://dx.doi.org/10.1145/1970392.1970395.
Moses Charikar, Samir Khuller, David M. Mount, and Giri Narasimhan. Algorithms for facility location problems with outliers. In Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA., pages 642-651, 2001. URL: http://dl.acm.org/citation.cfm?id=365411.365555.
Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 47-60, New York, NY, USA, 2017. ACM. URL: http://dx.doi.org/10.1145/3055399.3055491.
Ke Chen. A constant factor approximation algorithm for k-median clustering with outliers. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '08, pages 826-835, Philadelphia, PA, USA, 2008. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=1347082.1347173.
Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, and David P. Woodruff. Algorithms for lp low-rank approximation. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 806-814, 2017. URL: http://proceedings.mlr.press/v70/chierichetti17a.html.
Eden Chlamtac, Michael Dinitz, and Robert Krauthgamer. Everywhere-sparse spanners via dense subgraphs. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pages 758-767, 2012. URL: http://dx.doi.org/10.1109/FOCS.2012.61.
Amit Deshpande and Santosh Vempala. Adaptive sampling and fast low-rank matrix approximation. In Josep Díaz, Klaus Jansen, José D. P. Rolim, and Uri Zwick, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 292-303, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 655-664, Oct 2016. URL: http://dx.doi.org/10.1109/FOCS.2016.85.
Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. List-decodable robust mean estimation and learning mixtures of spherical gaussians. CoRR, abs/1711.07211, 2017. URL: http://arxiv.org/abs/1711.07211.
Uriel Feige and Joe Kilian. Heuristics for semirandom graph problems. Journal of Computing and System Sciences, 63:639-671, 2001.
Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381-395, 1981. URL: http://dx.doi.org/10.1145/358669.358692.
Alan M. Frieze and Ravi Kannan. Quick approximation to matrices and applications. Combinatorica, 19(2):175-220, 1999. URL: http://dx.doi.org/10.1007/s004930050052.
Gene H. Golub and Charles F. Van Loan. Matrix Computations (3rd Ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996.
Shalmoli Gupta, Ravi Kumar, Kefu Lu, Benjamin Moseley, and Sergei Vassilvitskii. Local search methods for k-means with outliers. Proc. VLDB Endow., 10(7):757-768, 2017. URL: http://dx.doi.org/10.14778/3067421.3067425.
V. Guruswami and A. K. Sinop. Optimal column-based low-rank matrix reconstruction. In SODA, 2012.
Moritz Hardt and Ankur Moitra. Algorithms and hardness for robust subspace recovery. In COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14, 2013, Princeton University, NJ, USA, pages 354-375, 2013. URL: http://jmlr.org/proceedings/papers/v30/Hardt13.html.
Samuel B. Hopkins and Jerry Li. Mixture models, robustness, and sum of squares proofs. CoRR, abs/1711.07454, 2017. URL: http://arxiv.org/abs/1711.07454.
Peter J. Huber. Robust Statistics. Wiley, 1981.
Peter J. Huber and Elvezio M. Ronchetti. Robust Statistics, 2nd Edition. Wiley, 2009.
Pravesh K. Kothari and Jacob Steinhardt. Better agnostic clustering via relaxed tensor norms. CoRR, abs/1711.07465, 2017. URL: http://arxiv.org/abs/1711.07465.
Ravishankar Krishnaswamy, Shi Li, and Sai Sandeep. Constant approximation for k-median and k-means with outliers via iterative rounding. CoRR, abs/1711.01323, 2017.
K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665-674, Oct 2016. URL: http://dx.doi.org/10.1109/FOCS.2016.76.
Konstantin Makarychev, Yury Makarychev, and Aravindan Vijayaraghavan. Learning communities in the presence of errors. In Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016, pages 1258-1291, 2016.
Frank McSherry. Spectral partitioning of random graphs. In 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas, Nevada, USA, pages 529-537, 2001. URL: http://dx.doi.org/10.1109/SFCS.2001.959929.
Prasad Raghavendra and David Steurer. Graph expansion and the unique games conjecture. In Leonard J. Schulman, editor, STOC, pages 755-764. ACM, 2010. URL: http://dx.doi.org/10.1145/1806689.1806788.
Jacob Steinhardt, Moses Charikar, and Gregory Valiant. Resilience: A criterion for learning in the presence of arbitrary outliers. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, pages 45:1-45:21, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.ITCS.2018.45.
G. W. Stewart and Ji guang Sun. Matrix Perturbation Theory. Academic Press, 1990.
H. Xu, C. Caramanis, and S. Mannor. Outlier-robust pca: The high-dimensional case. IEEE Transactions on Information Theory, 59(1):546-572, Jan 2013. URL: http://dx.doi.org/10.1109/TIT.2012.2212415.
H. Xu, C. Caramanis, and S. Sanghavi. Robust pca via outlier pursuit. IEEE Transactions on Information Theory, 58(5):3047-3064, May 2012. URL: http://dx.doi.org/10.1109/TIT.2011.2173156.
Xinyang Yi, Dohyung Park, Yudong Chen, and Constantine Caramanis. Fast algorithms for robust PCA via gradient descent. CoRR, abs/1605.07784, 2016. URL: http://arxiv.org/abs/1605.07784.

Low Rank Approximation in the Presence of Outliers

Authors Aditya Bhaskara, Srivatsan Kumar

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Low Rank Approximation in the Presence of Outliers

Authors Aditya Bhaskara, Srivatsan Kumar

File

Document Identifiers

Author Details

Funding

Cite As Get BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message