A Combinatorial Approach to Robust PCA

Kong, Weihao; Qiao, Mingda; Sen, Rajat

doi:10.4230/LIPIcs.ITCS.2024.70

Abstract

We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown k-dimensional subspace U ⊆ ℝ^d, and s randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean. Our main result is an efficient algorithm that, when ks² = O(d), recovers every single data point up to a nearly-optimal 𝓁₁ error of Õ(ks/d) in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace U. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace U is arbitrary.

Alekh Agarwal, Sahand Negahban, and Martin J Wainwright. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, pages 1171-1197, 2012.
Khanh Do Ba, Piotr Indyk, Eric Price, and David P Woodruff. Lower bounds for sparse recovery. In Symposium on Discrete Algorithms (SODA), pages 1190-1197, 2010.
Ainesh Bakshi and Pravesh K Kothari. List-decodable subspace recovery: Dimension independent error in polynomial time. In Symposium on Discrete Algorithms (SODA), pages 1279-1297, 2021.
Maria-Florina Balcan, Avrim Blum, and Santosh Vempala. A discriminative framework for clustering via similarity functions. In Symposium on Theory of Computing (STOC), pages 671-680, 2008.
Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):1-37, 2011.
Emmanuel J Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Transactions on information theory, 52(2):489-509, 2006.
Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Symposium on Theory of Computing (STOC), pages 47-60, 2017.
Scott Shaobing Chen, David L Donoho, and Michael A Saunders. Atomic decomposition by basis pursuit. SIAM Review, 43(1):129-159, 2001.
Yudong Chen, Constantine Caramanis, and Shie Mannor. Robust sparse regression under adversarial corruption. In International Conference on Machine Learning (ICML), pages 774-782, 2013.
M Deza and P Frankl. Erdős-ko-rado theorem—22 years later. SIAM Journal on Algebraic Discrete Methods, 4(4):419-431, 1983.
Ilias Diakonikolas, Daniel Kane, Ankit Pensia, and Thanasis Pittas. Nearly-linear time and streaming algorithms for outlier-robust pca. In International Conference on Machine Learning (ICML), pages 7886-7921, 2023.
Ilias Diakonikolas and Daniel M Kane. Recent advances in algorithmic high-dimensional robust statistics. arXiv preprint, 2019. URL: https://arxiv.org/abs/1911.05911.
Ilias Diakonikolas, Weihao Kong, and Alistair Stewart. Efficient algorithms and lower bounds for robust linear regression. In Symposium on Discrete Algorithms (SODA), pages 2745-2754, 2019.
David L Donoho. Compressed sensing. Transactions on information theory, 52(4):1289-1306, 2006.
P. Erdős, Chao Ko, and R. Rado. Intersection theorems for systems of finite sets. The Quarterly Journal of Mathematics, 12(1):313-320, 1961.
Paul Erdős. A problem on independent r-tuples. Ann. Univ. Sci. Budapest. Eötvös Sect. Math, 8:93-95, 1965.
Jiashi Feng, Huan Xu, and Shuicheng Yan. Robust pca in high-dimension: a deterministic approach. In International Conference on Machine Learning (ICML), pages 1827-1834, 2012.
Simon Foucart and Holger Rauhut. Basis pursuit. A Mathematical Introduction to Compressive Sensing, pages 77-110, 2013.
Peter Frankl and Andrey Kupavskii. The erdős matching conjecture and concentration inequalities. Journal of Combinatorial Theory, Series B, 157:366-400, 2022.
P Hall. On representatives of subsets. Journal of the London Mathematical Society, 1(1):26-30, 1935.
Moritz Hardt and Ankur Moitra. Algorithms and hardness for robust subspace recovery. In Conference on Learning Theory (COLT), pages 354-375, 2013.
Daniel Hsu, Sham M Kakade, and Tong Zhang. Robust matrix decomposition with sparse corruptions. Transactions on Information Theory, 57(11):7221-7234, 2011.
Lunjia Hu and Omer Reingold. Robust mean estimation on highly incomplete data with arbitrary outliers. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1558-1566, 2021.
Piotr Indyk and Eric Price. K-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance. In Symposium on Theory of Computing (STOC), pages 627-636, 2011.
Piotr Indyk, Eric Price, and David P Woodruff. On the power of adaptivity in sparse recovery. In Foundations of Computer Science (FOCS), pages 285-294, 2011.
Piotr Indyk and Milan Ruzic. Near-optimal sparse recovery in the l1 norm. In Foundations of Computer Science (FOCS), pages 199-207, 2008.
Arun Jambulapati, Jerry Li, and Kevin Tian. Robust sub-gaussian principal component analysis and width-independent schatten packing. Advances in Neural Information Processing Systems (NeurIPS), pages 15689-15701, 2020.
Stasys Jukna. Extremal combinatorics: with applications in computer science, volume 571. Springer, 2011.
Akshay Kamath and Eric Price. Adaptive sparse recovery with limited adaptivity. In Symposium on Discrete Algorithms (SODA), pages 2729-2744, 2019.
Gyula OH Katona. A simple proof of the erdős-chao ko-rado theorem. Journal of Combinatorial Theory, Series B, 13(2):183-184, 1972.
Weihao Kong, Raghav Somani, Sham Kakade, and Sewoong Oh. Robust meta-learning for mixed linear regression with small batches. Advances in Neural Information Processing Systems (NeurIPS), pages 4683-4696, 2020.
Gilad Lerman and Tyler Maunu. An overview of robust subspace recovery. Proceedings of the IEEE, 106(8):1380-1410, 2018.
Zifan Liu, Jong Ho Park, Theodoros Rekatsinas, and Christos Tzamos. On robust mean estimation under coordinate-level corruption. In International Conference on Machine Learning (ICML), pages 6914-6924, 2021.
Tyler Maunu and Gilad Lerman. Robust subspace recovery with adversarial outliers. arXiv preprint, 2019. URL: https://arxiv.org/abs/1904.03275.
Praneeth Netrapalli, U N Niranjan, Sujay Sanghavi, Animashree Anandkumar, and Prateek Jain. Non-convex robust pca. In Advances in Neural Information Processing Systems (NIPS), pages 1107-1115, 2014.
Noam Nisan. Pseudorandom bits for constant depth circuits. Combinatorica, 11:63-70, 1991.
Eric Price and David P Woodruff. (1 + eps)-approximate sparse recovery. In Foundations of Computer Science (FOCS), pages 295-304, 2011.
Eric Price and David P Woodruff. Lower bounds for adaptive sparse recovery. In Symposium on Discrete Algorithms (SODA), pages 652-663, 2013.
Prasad Raghavendra and Morris Yau. List decodable subspace recovery. In Conference on Learning Theory (COLT), pages 3206-3226, 2020.
Vojtěch Rödl. On a packing and covering problem. European Journal of Combinatorics, 6(1):69-78, 1985.
Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
Huan Xu, Constantine Caramanis, and Shie Mannor. Outlier-robust pca: The high-dimensional case. Transactions on Information Theory, 59(1):546-572, 2013.
Huan Xu, Constantine Caramanis, and Sujay Sanghavi. Robust pca via outlier pursuit. Transactions on Information Theory, 5(58):3047-3064, 2012.
Wenzhuo Yang and Huan Xu. A unified framework for outlier-robust pca-like algorithms. In International Conference on Machine Learning (ICML), pages 484-493, 2015.

A Combinatorial Approach to Robust PCA

Authors Weihao Kong, Mingda Qiao , Rajat Sen

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

A Combinatorial Approach to Robust PCA

Authors Weihao Kong, Mingda Qiao , Rajat Sen

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message