A Combinatorial Approach to Robust PCA

Authors Weihao Kong, Mingda Qiao , Rajat Sen



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2024.70.pdf
  • Filesize: 0.78 MB
  • 22 pages

Document Identifiers

Author Details

Weihao Kong
  • Google Research, Mountain View, CA, USA
Mingda Qiao
  • University of California, Berkeley, CA, USA
Rajat Sen
  • Google Research, Mountain View, CA,USA

Acknowledgements

We thank the anonymous reviewers of ITCS 2024 for their comments that helped improving this paper.

Cite AsGet BibTex

Weihao Kong, Mingda Qiao, and Rajat Sen. A Combinatorial Approach to Robust PCA. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 70:1-70:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ITCS.2024.70

Abstract

We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown k-dimensional subspace U ⊆ ℝ^d, and s randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean. Our main result is an efficient algorithm that, when ks² = O(d), recovers every single data point up to a nearly-optimal 𝓁₁ error of Õ(ks/d) in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace U. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace U is arbitrary.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Robust PCA
  • Sparse Recovery
  • Robust Statistics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alekh Agarwal, Sahand Negahban, and Martin J Wainwright. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, pages 1171-1197, 2012. Google Scholar
  2. Khanh Do Ba, Piotr Indyk, Eric Price, and David P Woodruff. Lower bounds for sparse recovery. In Symposium on Discrete Algorithms (SODA), pages 1190-1197, 2010. Google Scholar
  3. Ainesh Bakshi and Pravesh K Kothari. List-decodable subspace recovery: Dimension independent error in polynomial time. In Symposium on Discrete Algorithms (SODA), pages 1279-1297, 2021. Google Scholar
  4. Maria-Florina Balcan, Avrim Blum, and Santosh Vempala. A discriminative framework for clustering via similarity functions. In Symposium on Theory of Computing (STOC), pages 671-680, 2008. Google Scholar
  5. Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1-37, 2011. Google Scholar
  6. Emmanuel J Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Transactions on information theory, 52(2):489-509, 2006. Google Scholar
  7. Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Symposium on Theory of Computing (STOC), pages 47-60, 2017. Google Scholar
  8. Scott Shaobing Chen, David L Donoho, and Michael A Saunders. Atomic decomposition by basis pursuit. SIAM Review, 43(1):129-159, 2001. Google Scholar
  9. Yudong Chen, Constantine Caramanis, and Shie Mannor. Robust sparse regression under adversarial corruption. In International Conference on Machine Learning (ICML), pages 774-782, 2013. Google Scholar
  10. M Deza and P Frankl. Erdős-ko-rado theorem—22 years later. SIAM Journal on Algebraic Discrete Methods, 4(4):419-431, 1983. Google Scholar
  11. Ilias Diakonikolas, Daniel Kane, Ankit Pensia, and Thanasis Pittas. Nearly-linear time and streaming algorithms for outlier-robust pca. In International Conference on Machine Learning (ICML), pages 7886-7921, 2023. Google Scholar
  12. Ilias Diakonikolas and Daniel M Kane. Recent advances in algorithmic high-dimensional robust statistics. arXiv preprint, 2019. URL: https://arxiv.org/abs/1911.05911.
  13. Ilias Diakonikolas, Weihao Kong, and Alistair Stewart. Efficient algorithms and lower bounds for robust linear regression. In Symposium on Discrete Algorithms (SODA), pages 2745-2754, 2019. Google Scholar
  14. David L Donoho. Compressed sensing. Transactions on information theory, 52(4):1289-1306, 2006. Google Scholar
  15. P. Erdős, Chao Ko, and R. Rado. Intersection theorems for systems of finite sets. The Quarterly Journal of Mathematics, 12(1):313-320, 1961. Google Scholar
  16. Paul Erdős. A problem on independent r-tuples. Ann. Univ. Sci. Budapest. Eötvös Sect. Math, 8:93-95, 1965. Google Scholar
  17. Jiashi Feng, Huan Xu, and Shuicheng Yan. Robust pca in high-dimension: a deterministic approach. In International Conference on Machine Learning (ICML), pages 1827-1834, 2012. Google Scholar
  18. Simon Foucart and Holger Rauhut. Basis pursuit. A Mathematical Introduction to Compressive Sensing, pages 77-110, 2013. Google Scholar
  19. Peter Frankl and Andrey Kupavskii. The erdős matching conjecture and concentration inequalities. Journal of Combinatorial Theory, Series B, 157:366-400, 2022. Google Scholar
  20. P Hall. On representatives of subsets. Journal of the London Mathematical Society, 1(1):26-30, 1935. Google Scholar
  21. Moritz Hardt and Ankur Moitra. Algorithms and hardness for robust subspace recovery. In Conference on Learning Theory (COLT), pages 354-375, 2013. Google Scholar
  22. Daniel Hsu, Sham M Kakade, and Tong Zhang. Robust matrix decomposition with sparse corruptions. Transactions on Information Theory, 57(11):7221-7234, 2011. Google Scholar
  23. Lunjia Hu and Omer Reingold. Robust mean estimation on highly incomplete data with arbitrary outliers. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1558-1566, 2021. Google Scholar
  24. Piotr Indyk and Eric Price. K-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance. In Symposium on Theory of Computing (STOC), pages 627-636, 2011. Google Scholar
  25. Piotr Indyk, Eric Price, and David P Woodruff. On the power of adaptivity in sparse recovery. In Foundations of Computer Science (FOCS), pages 285-294, 2011. Google Scholar
  26. Piotr Indyk and Milan Ruzic. Near-optimal sparse recovery in the l1 norm. In Foundations of Computer Science (FOCS), pages 199-207, 2008. Google Scholar
  27. Arun Jambulapati, Jerry Li, and Kevin Tian. Robust sub-gaussian principal component analysis and width-independent schatten packing. Advances in Neural Information Processing Systems (NeurIPS), pages 15689-15701, 2020. Google Scholar
  28. Stasys Jukna. Extremal combinatorics: with applications in computer science, volume 571. Springer, 2011. Google Scholar
  29. Akshay Kamath and Eric Price. Adaptive sparse recovery with limited adaptivity. In Symposium on Discrete Algorithms (SODA), pages 2729-2744, 2019. Google Scholar
  30. Gyula OH Katona. A simple proof of the erdős-chao ko-rado theorem. Journal of Combinatorial Theory, Series B, 13(2):183-184, 1972. Google Scholar
  31. Weihao Kong, Raghav Somani, Sham Kakade, and Sewoong Oh. Robust meta-learning for mixed linear regression with small batches. Advances in Neural Information Processing Systems (NeurIPS), pages 4683-4696, 2020. Google Scholar
  32. Gilad Lerman and Tyler Maunu. An overview of robust subspace recovery. Proceedings of the IEEE, 106(8):1380-1410, 2018. Google Scholar
  33. Zifan Liu, Jong Ho Park, Theodoros Rekatsinas, and Christos Tzamos. On robust mean estimation under coordinate-level corruption. In International Conference on Machine Learning (ICML), pages 6914-6924, 2021. Google Scholar
  34. Tyler Maunu and Gilad Lerman. Robust subspace recovery with adversarial outliers. arXiv preprint, 2019. URL: https://arxiv.org/abs/1904.03275.
  35. Praneeth Netrapalli, U N Niranjan, Sujay Sanghavi, Animashree Anandkumar, and Prateek Jain. Non-convex robust pca. In Advances in Neural Information Processing Systems (NIPS), pages 1107-1115, 2014. Google Scholar
  36. Noam Nisan. Pseudorandom bits for constant depth circuits. Combinatorica, 11:63-70, 1991. Google Scholar
  37. Eric Price and David P Woodruff. (1 + eps)-approximate sparse recovery. In Foundations of Computer Science (FOCS), pages 295-304, 2011. Google Scholar
  38. Eric Price and David P Woodruff. Lower bounds for adaptive sparse recovery. In Symposium on Discrete Algorithms (SODA), pages 652-663, 2013. Google Scholar
  39. Prasad Raghavendra and Morris Yau. List decodable subspace recovery. In Conference on Learning Theory (COLT), pages 3206-3226, 2020. Google Scholar
  40. Vojtěch Rödl. On a packing and covering problem. European Journal of Combinatorics, 6(1):69-78, 1985. Google Scholar
  41. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019. Google Scholar
  42. Huan Xu, Constantine Caramanis, and Shie Mannor. Outlier-robust pca: The high-dimensional case. Transactions on Information Theory, 59(1):546-572, 2013. Google Scholar
  43. Huan Xu, Constantine Caramanis, and Sujay Sanghavi. Robust pca via outlier pursuit. Transactions on Information Theory, 5(58):3047-3064, 2012. Google Scholar
  44. Wenzhuo Yang and Huan Xu. A unified framework for outlier-robust pca-like algorithms. In International Conference on Machine Learning (ICML), pages 484-493, 2015. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail