Document

**Published in:** LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown k-dimensional subspace U ⊆ ℝ^d, and s randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean.
Our main result is an efficient algorithm that, when ks² = O(d), recovers every single data point up to a nearly-optimal 𝓁₁ error of Õ(ks/d) in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace U. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace U is arbitrary.

Weihao Kong, Mingda Qiao, and Rajat Sen. A Combinatorial Approach to Robust PCA. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 70:1-70:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{kong_et_al:LIPIcs.ITCS.2024.70, author = {Kong, Weihao and Qiao, Mingda and Sen, Rajat}, title = {{A Combinatorial Approach to Robust PCA}}, booktitle = {15th Innovations in Theoretical Computer Science Conference (ITCS 2024)}, pages = {70:1--70:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-309-6}, ISSN = {1868-8969}, year = {2024}, volume = {287}, editor = {Guruswami, Venkatesan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.70}, URN = {urn:nbn:de:0030-drops-195984}, doi = {10.4230/LIPIcs.ITCS.2024.70}, annote = {Keywords: Robust PCA, Sparse Recovery, Robust Statistics} }

Document

**Published in:** LIPIcs, Volume 251, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)

We study a "pen testing" problem, in which we are given n pens with unknown amounts of ink X₁, X₂, …, X_n, and we want to choose a pen with the maximum amount of remaining ink in it. The challenge is that we cannot access each X_i directly; we only get to write with the i-th pen until either a certain amount of ink is used, or the pen runs out of ink. In both cases, this testing reduces the remaining ink in the pen and thus the utility of selecting it.
Despite this significant lack of information, we show that it is possible to approximately maximize our utility up to an O(log n) factor. Formally, we consider two different setups: the "prophet" setting, in which each X_i is independently drawn from some distribution 𝒟_i, and the "secretary" setting, in which (X_i)_{i=1}^n is a random permutation of arbitrary a₁, a₂, …, a_n. We derive the optimal competitive ratios in both settings up to constant factors. Our algorithms are surprisingly robust: (1) In the prophet setting, we only require one sample from each 𝒟_i, rather than a full description of the distribution; (2) In the secretary setting, the algorithm also succeeds under an arbitrary permutation, if an estimate of the maximum a_i is given.
Our techniques include a non-trivial online sampling scheme from a sequence with an unknown length, as well as the construction of a hard, non-uniform distribution over permutations. Both might be of independent interest. We also highlight some immediate open problems and discuss several directions for future research.

Mingda Qiao and Gregory Valiant. Online Pen Testing. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 91:1-91:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{qiao_et_al:LIPIcs.ITCS.2023.91, author = {Qiao, Mingda and Valiant, Gregory}, title = {{Online Pen Testing}}, booktitle = {14th Innovations in Theoretical Computer Science Conference (ITCS 2023)}, pages = {91:1--91:26}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-263-1}, ISSN = {1868-8969}, year = {2023}, volume = {251}, editor = {Tauman Kalai, Yael}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2023.91}, URN = {urn:nbn:de:0030-drops-175940}, doi = {10.4230/LIPIcs.ITCS.2023.91}, annote = {Keywords: Optimal stopping, online algorithm} }

Document

RANDOM

**Published in:** LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996).
Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees.
We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy.

Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan. Decision Tree Heuristics Can Fail, Even in the Smoothed Setting. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 45:1-45:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.APPROX/RANDOM.2021.45, author = {Blanc, Guy and Lange, Jane and Qiao, Mingda and Tan, Li-Yang}, title = {{Decision Tree Heuristics Can Fail, Even in the Smoothed Setting}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)}, pages = {45:1--45:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-207-5}, ISSN = {1868-8969}, year = {2021}, volume = {207}, editor = {Wootters, Mary and Sanit\`{a}, Laura}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.45}, URN = {urn:nbn:de:0030-drops-147386}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2021.45}, annote = {Keywords: decision trees, learning theory, smoothed analysis} }

Document

**Published in:** LIPIcs, Volume 94, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)

We consider the problem of learning a discrete distribution in the presence of an epsilon fraction of malicious data sources. Specifically, we consider the setting where there is some underlying distribution, p, and each data source provides a batch of >= k samples, with the guarantee that at least a (1 - epsilon) fraction of the sources draw their samples from a distribution with total variation distance at most \eta from p. We make no assumptions on the data provided by the remaining epsilon fraction of sources--this data can even be chosen as an adversarial function of the (1 - epsilon) fraction of "good" batches. We provide two algorithms: one with runtime exponential in the support size, n, but polynomial in k, 1/epsilon and 1/eta that takes O((n + k)/epsilon^2) batches and recovers p to error O(eta + epsilon/sqrt(k)). This recovery accuracy is information theoretically optimal, to constant factors, even given an infinite number of data sources. Our second algorithm applies to the eta = 0 setting and also achieves an O(epsilon/sqrt(k)) recover guarantee, though it runs in poly((nk)^k) time. This second algorithm, which approximates a certain tensor via a rank-1 tensor minimizing l_1 distance, is surprising in light of the hardness of many low-rank tensor approximation problems, and may be of independent interest.

Mingda Qiao and Gregory Valiant. Learning Discrete Distributions from Untrusted Batches. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 47:1-47:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{qiao_et_al:LIPIcs.ITCS.2018.47, author = {Qiao, Mingda and Valiant, Gregory}, title = {{Learning Discrete Distributions from Untrusted Batches}}, booktitle = {9th Innovations in Theoretical Computer Science Conference (ITCS 2018)}, pages = {47:1--47:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-060-6}, ISSN = {1868-8969}, year = {2018}, volume = {94}, editor = {Karlin, Anna R.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2018.47}, URN = {urn:nbn:de:0030-drops-83215}, doi = {10.4230/LIPIcs.ITCS.2018.47}, annote = {Keywords: robust statistics, information-theoretic optimality} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail