,
Rajat Sen
Creative Commons Attribution 4.0 International license
We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown k-dimensional subspace U ⊆ ℝ^d, and s randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean. Our main result is an efficient algorithm that, when ks² = O(d), recovers every single data point up to a nearly-optimal 𝓁₁ error of Õ(ks/d) in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace U. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace U is arbitrary.
@InProceedings{kong_et_al:LIPIcs.ITCS.2024.70,
author = {Kong, Weihao and Qiao, Mingda and Sen, Rajat},
title = {{A Combinatorial Approach to Robust PCA}},
booktitle = {15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
pages = {70:1--70:22},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-309-6},
ISSN = {1868-8969},
year = {2024},
volume = {287},
editor = {Guruswami, Venkatesan},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.70},
URN = {urn:nbn:de:0030-drops-195984},
doi = {10.4230/LIPIcs.ITCS.2024.70},
annote = {Keywords: Robust PCA, Sparse Recovery, Robust Statistics}
}