Low Rank Approximation in the Presence of Outliers

Authors Aditya Bhaskara, Srivatsan Kumar



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2018.4.pdf
  • Filesize: 0.5 MB
  • 16 pages

Document Identifiers

Author Details

Aditya Bhaskara
  • School of Computing, University of Utah, Salt Lake City, UT, USA, http://www.cs.utah.edu/~bhaskara/
Srivatsan Kumar
  • School of Computing, University of Utah, Salt Lake City, UT, USA

Cite AsGet BibTex

Aditya Bhaskara and Srivatsan Kumar. Low Rank Approximation in the Presence of Outliers. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 116, pp. 4:1-4:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2018.4

Abstract

We consider the problem of principal component analysis (PCA) in the presence of outliers. Given a matrix A (d x n) and parameters k, m, the goal is to remove a set of at most m columns of A (outliers), so as to minimize the rank-k approximation error of the remaining matrix (inliers). While much of the work on this problem has focused on recovery of the rank-k subspace under assumptions on the inliers and outliers, we focus on the approximation problem. Our main result shows that sampling-based methods developed in the outlier-free case give non-trivial guarantees even in the presence of outliers. Using this insight, we develop a simple algorithm that has bi-criteria guarantees. Further, unlike similar formulations for clustering, we show that bi-criteria guarantees are unavoidable for the problem, under appropriate complexity assumptions.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • Low rank approximation
  • PCA
  • Robustness to outliers

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Maria-Florina Balcan, Avrim Blum, and Santosh Vempala. A discriminative framework for clustering via similarity functions. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, STOC '08, pages 671-680, New York, NY, USA, 2008. ACM. URL: http://dx.doi.org/10.1145/1374376.1374474.
  2. Aditya Bhaskara, Moses Charikar, Eden Chlamtac, Uriel Feige, and Aravindan Vijayaraghavan. Detecting high log-densities: an n^1/4 approximation for densest k-subgraph. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, pages 201-210, 2010. URL: http://dx.doi.org/10.1145/1806689.1806718.
  3. Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami, Aravindan Vijayaraghavan, and Yuan Zhou. Polynomial integrality gaps for strong sdp relaxations of densest k-subgraph. In ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, 2012. URL: http://dx.doi.org/10.1145/1806689.1806718.
  4. S. Charles Brubaker. Robust PCA and clustering in noisy mixtures. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 1078-1087, 2009. Google Scholar
  5. Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? J. ACM, 58(3):11:1-11:37, jun 2011. URL: http://dx.doi.org/10.1145/1970392.1970395.
  6. Moses Charikar, Samir Khuller, David M. Mount, and Giri Narasimhan. Algorithms for facility location problems with outliers. In Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA., pages 642-651, 2001. URL: http://dl.acm.org/citation.cfm?id=365411.365555.
  7. Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 47-60, New York, NY, USA, 2017. ACM. URL: http://dx.doi.org/10.1145/3055399.3055491.
  8. Ke Chen. A constant factor approximation algorithm for k-median clustering with outliers. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '08, pages 826-835, Philadelphia, PA, USA, 2008. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=1347082.1347173.
  9. Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, and David P. Woodruff. Algorithms for lp low-rank approximation. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 806-814, 2017. URL: http://proceedings.mlr.press/v70/chierichetti17a.html.
  10. Eden Chlamtac, Michael Dinitz, and Robert Krauthgamer. Everywhere-sparse spanners via dense subgraphs. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pages 758-767, 2012. URL: http://dx.doi.org/10.1109/FOCS.2012.61.
  11. Amit Deshpande and Santosh Vempala. Adaptive sampling and fast low-rank matrix approximation. In Josep Díaz, Klaus Jansen, José D. P. Rolim, and Uri Zwick, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 292-303, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. Google Scholar
  12. I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 655-664, Oct 2016. URL: http://dx.doi.org/10.1109/FOCS.2016.85.
  13. Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. List-decodable robust mean estimation and learning mixtures of spherical gaussians. CoRR, abs/1711.07211, 2017. URL: http://arxiv.org/abs/1711.07211.
  14. Uriel Feige and Joe Kilian. Heuristics for semirandom graph problems. Journal of Computing and System Sciences, 63:639-671, 2001. Google Scholar
  15. Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381-395, 1981. URL: http://dx.doi.org/10.1145/358669.358692.
  16. Alan M. Frieze and Ravi Kannan. Quick approximation to matrices and applications. Combinatorica, 19(2):175-220, 1999. URL: http://dx.doi.org/10.1007/s004930050052.
  17. Gene H. Golub and Charles F. Van Loan. Matrix Computations (3rd Ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996. Google Scholar
  18. Shalmoli Gupta, Ravi Kumar, Kefu Lu, Benjamin Moseley, and Sergei Vassilvitskii. Local search methods for k-means with outliers. Proc. VLDB Endow., 10(7):757-768, 2017. URL: http://dx.doi.org/10.14778/3067421.3067425.
  19. V. Guruswami and A. K. Sinop. Optimal column-based low-rank matrix reconstruction. In SODA, 2012. Google Scholar
  20. Moritz Hardt and Ankur Moitra. Algorithms and hardness for robust subspace recovery. In COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14, 2013, Princeton University, NJ, USA, pages 354-375, 2013. URL: http://jmlr.org/proceedings/papers/v30/Hardt13.html.
  21. Samuel B. Hopkins and Jerry Li. Mixture models, robustness, and sum of squares proofs. CoRR, abs/1711.07454, 2017. URL: http://arxiv.org/abs/1711.07454.
  22. Peter J. Huber. Robust Statistics. Wiley, 1981. Google Scholar
  23. Peter J. Huber and Elvezio M. Ronchetti. Robust Statistics, 2nd Edition. Wiley, 2009. Google Scholar
  24. Pravesh K. Kothari and Jacob Steinhardt. Better agnostic clustering via relaxed tensor norms. CoRR, abs/1711.07465, 2017. URL: http://arxiv.org/abs/1711.07465.
  25. Ravishankar Krishnaswamy, Shi Li, and Sai Sandeep. Constant approximation for k-median and k-means with outliers via iterative rounding. CoRR, abs/1711.01323, 2017. Google Scholar
  26. K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665-674, Oct 2016. URL: http://dx.doi.org/10.1109/FOCS.2016.76.
  27. Konstantin Makarychev, Yury Makarychev, and Aravindan Vijayaraghavan. Learning communities in the presence of errors. In Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016, pages 1258-1291, 2016. Google Scholar
  28. Frank McSherry. Spectral partitioning of random graphs. In 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas, Nevada, USA, pages 529-537, 2001. URL: http://dx.doi.org/10.1109/SFCS.2001.959929.
  29. Prasad Raghavendra and David Steurer. Graph expansion and the unique games conjecture. In Leonard J. Schulman, editor, STOC, pages 755-764. ACM, 2010. URL: http://dx.doi.org/10.1145/1806689.1806788.
  30. Jacob Steinhardt, Moses Charikar, and Gregory Valiant. Resilience: A criterion for learning in the presence of arbitrary outliers. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, pages 45:1-45:21, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.ITCS.2018.45.
  31. G. W. Stewart and Ji guang Sun. Matrix Perturbation Theory. Academic Press, 1990. Google Scholar
  32. H. Xu, C. Caramanis, and S. Mannor. Outlier-robust pca: The high-dimensional case. IEEE Transactions on Information Theory, 59(1):546-572, Jan 2013. URL: http://dx.doi.org/10.1109/TIT.2012.2212415.
  33. H. Xu, C. Caramanis, and S. Sanghavi. Robust pca via outlier pursuit. IEEE Transactions on Information Theory, 58(5):3047-3064, May 2012. URL: http://dx.doi.org/10.1109/TIT.2011.2173156.
  34. Xinyang Yi, Dohyung Park, Yudong Chen, and Constantine Caramanis. Fast algorithms for robust PCA via gradient descent. CoRR, abs/1605.07784, 2016. URL: http://arxiv.org/abs/1605.07784.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail