DROPS

Document

DOI: 10.4230/LIPIcs.ITCS.2026.91

Limitations of Membership Queries in Testable Learning

Authors: Jane Lange and Mingda Qiao

Published in: LIPIcs, Volume 362, 17th Innovations in Theoretical Computer Science Conference (ITCS 2026)

Abstract

Membership queries (MQ) often yield speedups for learning tasks, particularly in the distribution-specific setting. We show that in the testable learning model of Rubinfeld and Vasilyan [Rubinfeld and Vasilyan, 2023], membership queries cannot decrease the time complexity of testable learning algorithms beyond the complexity of sample-only distribution-specific learning. In the testable learning model, the learner must output a hypothesis whenever the data distribution satisfies a desired property, and if it outputs a hypothesis, the hypothesis must be near-optimal. We give a general reduction from sample-based refutation of boolean concept classes, as presented in [Vadhan, 2017; Kothari and Livni, 2018], to testable learning with queries (TL-Q). This yields lower bounds for TL-Q via the reduction from learning to refutation given in [Kothari and Livni, 2018]. The result is that, relative to a concept class and a distribution family, no m-sample TL-Q algorithm can be super-polynomially more time-efficient than the best m-sample PAC learner. Finally, we define a class of "statistical" MQ algorithms that encompasses many known distribution-specific MQ learners, such as those based on influence estimation or subcube-conditional statistical queries. We show that TL-Q algorithms in this class imply efficient statistical-query refutation and learning algorithms. Thus, combined with known SQ dimension lower bounds, our results imply that these efficient membership query learners cannot be made testable.

Cite as

Jane Lange and Mingda Qiao. Limitations of Membership Queries in Testable Learning. In 17th Innovations in Theoretical Computer Science Conference (ITCS 2026). Leibniz International Proceedings in Informatics (LIPIcs), Volume 362, pp. 91:1-91:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2026)

Copy BibTex To Clipboard

@InProceedings{lange_et_al:LIPIcs.ITCS.2026.91,
  author =	{Lange, Jane and Qiao, Mingda},
  title =	{{Limitations of Membership Queries in Testable Learning}},
  booktitle =	{17th Innovations in Theoretical Computer Science Conference (ITCS 2026)},
  pages =	{91:1--91:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-410-9},
  ISSN =	{1868-8969},
  year =	{2026},
  volume =	{362},
  editor =	{Saraf, Shubhangi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2026.91},
  URN =		{urn:nbn:de:0030-drops-253785},
  doi =		{10.4230/LIPIcs.ITCS.2026.91},
  annote =	{Keywords: Testable learning, PAC learning}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.23

On the Instance Optimality of Detecting Collisions and Subgraphs

Authors: Omri Ben-Eliezer, Tomer Grossman, and Moni Naor

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

Suppose you are given a function f: [n] → [n] via (black-box) query access to the function. You are looking to find something local, like a collision (a pair x ≠ y s.t. f(x) = f(y)). The question is whether knowing the "shape" of the function helps you or not (by shape we mean that some permutation of the function is known). Formally, we investigate the unlabeled instance optimality of substructure detection problems in graphs and functions. A problem is g(n)-instance optimal if it admits an algorithm A satisfying that for any possible input, the (randomized) query complexity of A is at most g(n) times larger than the query complexity of any algorithm A' which solves the same problem while holding an unlabeled copy of the input (i.e., any A' that "knows the structure of the input"). Our results point to a trichotomy of unlabeled instance optimality among substructure detection problems in graphs and functions: - A few very simple properties have an O(1)-instance optimal algorithm. - Most properties of graphs and functions, with examples such as containing a fixed point or a 3-collision in functions, or a triangle in graphs, are n^{c}-far from instance optimal for some constant c > 0. - The problems of collision detection in functions and finding a claw in a graph serve as a middle ground between the two regimes. We show that these two properties are not Ω(log n)-instance optimal, and conjecture that this bound is tight. We provide evidence towards this conjecture, by proving that finding a claw in a graph is O(log(n))-instance optimal among all input graphs for which the query complexity of an algorithm holding an unlabeled certificate is O(√{n/(log n)}).

Cite as

Omri Ben-Eliezer, Tomer Grossman, and Moni Naor. On the Instance Optimality of Detecting Collisions and Subgraphs. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 23:1-23:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{beneliezer_et_al:LIPIcs.ICALP.2025.23,
  author =	{Ben-Eliezer, Omri and Grossman, Tomer and Naor, Moni},
  title =	{{On the Instance Optimality of Detecting Collisions and Subgraphs}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{23:1--23:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.23},
  URN =		{urn:nbn:de:0030-drops-234002},
  doi =		{10.4230/LIPIcs.ICALP.2025.23},
  annote =	{Keywords: instance optimality, instance complexity, unlabeled certificate, subgraph detection, collision detection}
}

Document

DOI: 10.4230/LIPIcs.FORC.2025.16

Smooth Calibration and Decision Making

Authors: Jason Hartline, Yifan Wu, and Yunran Yang

Published in: LIPIcs, Volume 329, 6th Symposium on Foundations of Responsible Computing (FORC 2025)

Abstract

Calibration requires predictor outputs to be consistent with their Bayesian posteriors. For machine learning predictors that do not distinguish between small perturbations, calibration errors are continuous in predictions, e.g. smooth calibration error [Foster and Hart, 2018], distance to calibration [Błasiok et al., 2023]. On the contrary, decision-makers who use predictions make optimal decisions discontinuously in probabilistic space, experiencing loss from miscalibration discontinuously. Calibration errors for decision-making are thus discontinuous, e.g., Expected Calibration Error [Foster and Vohra, 1997], and Calibration Decision Loss [Hu and Wu, 2024]. Thus, predictors with a low calibration error for machine learning may suffer a high calibration error for decision-making, i.e. they may not be trustworthy for decision-makers optimizing assuming their predictions are correct. It is natural to ask if post-processing a predictor with a low calibration error for machine learning is without loss to achieve a low calibration error for decision-making. In our paper, we show post-processing an online predictor with ε distance to calibration achieves O(√{ε}) ECE and CDL, which is asymptotically optimal. The post-processing algorithm adds noise to make predictions differentially private. The optimal bound from low distance to calibration predictors from post-processing is non-optimal compared with existing online calibration algorithms that directly optimize for ECE and CDL.

Cite as

Jason Hartline, Yifan Wu, and Yunran Yang. Smooth Calibration and Decision Making. In 6th Symposium on Foundations of Responsible Computing (FORC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 329, pp. 16:1-16:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{hartline_et_al:LIPIcs.FORC.2025.16,
  author =	{Hartline, Jason and Wu, Yifan and Yang, Yunran},
  title =	{{Smooth Calibration and Decision Making}},
  booktitle =	{6th Symposium on Foundations of Responsible Computing (FORC 2025)},
  pages =	{16:1--16:26},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-367-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{329},
  editor =	{Bun, Mark},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FORC.2025.16},
  URN =		{urn:nbn:de:0030-drops-231438},
  doi =		{10.4230/LIPIcs.FORC.2025.16},
  annote =	{Keywords: Calibration, calibration errors, decision making, differential privacy}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2025.52

Combinatorial Pen Testing (Or Consumer Surplus of Deferred-Acceptance Auctions)

Authors: Aadityan Ganesh and Jason Hartline

Published in: LIPIcs, Volume 325, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025)

Abstract

Pen testing is the problem of selecting high-capacity resources when the only way to measure the capacity of a resource expends its capacity. We have a set of n pens with unknown amounts of ink and our goal is to select a feasible subset of pens maximizing the total ink in them. We are allowed to learn about the ink levels by writing with them, but this uses up ink that was previously in the pens. We identify optimal and near optimal pen testing algorithms by drawing analogues to auction theoretic frameworks of deferred-acceptance auctions and virtual values. Our framework allows the conversion of any near optimal deferred-acceptance mechanism into a near optimal pen testing algorithm. Moreover, these algorithms guarantee an additional overhead of at most (1+o(1)) ln n in the approximation factor to the omniscient algorithm that has access to the ink levels in the pens. We use this framework to give pen testing algorithms for various combinatorial constraints like matroid, knapsack, and general downward-closed constraints, and also for online environments.

Cite as

Aadityan Ganesh and Jason Hartline. Combinatorial Pen Testing (Or Consumer Surplus of Deferred-Acceptance Auctions). In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 52:1-52:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ganesh_et_al:LIPIcs.ITCS.2025.52,
  author =	{Ganesh, Aadityan and Hartline, Jason},
  title =	{{Combinatorial Pen Testing (Or Consumer Surplus of Deferred-Acceptance Auctions)}},
  booktitle =	{16th Innovations in Theoretical Computer Science Conference (ITCS 2025)},
  pages =	{52:1--52:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-361-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{325},
  editor =	{Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.52},
  URN =		{urn:nbn:de:0030-drops-226808},
  doi =		{10.4230/LIPIcs.ITCS.2025.52},
  annote =	{Keywords: Pen testing, consumer surplus, money-burning, deferred-acceptance auctions}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2024.70

A Combinatorial Approach to Robust PCA

Authors: Weihao Kong, Mingda Qiao, and Rajat Sen

Published in: LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

Abstract

We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown k-dimensional subspace U ⊆ ℝ^d, and s randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean. Our main result is an efficient algorithm that, when ks² = O(d), recovers every single data point up to a nearly-optimal 𝓁₁ error of Õ(ks/d) in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace U. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace U is arbitrary.

Cite as

Weihao Kong, Mingda Qiao, and Rajat Sen. A Combinatorial Approach to Robust PCA. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 70:1-70:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{kong_et_al:LIPIcs.ITCS.2024.70,
  author =	{Kong, Weihao and Qiao, Mingda and Sen, Rajat},
  title =	{{A Combinatorial Approach to Robust PCA}},
  booktitle =	{15th Innovations in Theoretical Computer Science Conference (ITCS 2024)},
  pages =	{70:1--70:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-309-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{287},
  editor =	{Guruswami, Venkatesan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.70},
  URN =		{urn:nbn:de:0030-drops-195984},
  doi =		{10.4230/LIPIcs.ITCS.2024.70},
  annote =	{Keywords: Robust PCA, Sparse Recovery, Robust Statistics}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2023.91

Online Pen Testing

Authors: Mingda Qiao and Gregory Valiant

Published in: LIPIcs, Volume 251, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)

Abstract

We study a "pen testing" problem, in which we are given n pens with unknown amounts of ink X₁, X₂, …, X_n, and we want to choose a pen with the maximum amount of remaining ink in it. The challenge is that we cannot access each X_i directly; we only get to write with the i-th pen until either a certain amount of ink is used, or the pen runs out of ink. In both cases, this testing reduces the remaining ink in the pen and thus the utility of selecting it. Despite this significant lack of information, we show that it is possible to approximately maximize our utility up to an O(log n) factor. Formally, we consider two different setups: the "prophet" setting, in which each X_i is independently drawn from some distribution 𝒟_i, and the "secretary" setting, in which (X_i)_{i=1}^n is a random permutation of arbitrary a₁, a₂, …, a_n. We derive the optimal competitive ratios in both settings up to constant factors. Our algorithms are surprisingly robust: (1) In the prophet setting, we only require one sample from each 𝒟_i, rather than a full description of the distribution; (2) In the secretary setting, the algorithm also succeeds under an arbitrary permutation, if an estimate of the maximum a_i is given. Our techniques include a non-trivial online sampling scheme from a sequence with an unknown length, as well as the construction of a hard, non-uniform distribution over permutations. Both might be of independent interest. We also highlight some immediate open problems and discuss several directions for future research.

Cite as

Mingda Qiao and Gregory Valiant. Online Pen Testing. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 91:1-91:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{qiao_et_al:LIPIcs.ITCS.2023.91,
  author =	{Qiao, Mingda and Valiant, Gregory},
  title =	{{Online Pen Testing}},
  booktitle =	{14th Innovations in Theoretical Computer Science Conference (ITCS 2023)},
  pages =	{91:1--91:26},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-263-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{251},
  editor =	{Tauman Kalai, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2023.91},
  URN =		{urn:nbn:de:0030-drops-175940},
  doi =		{10.4230/LIPIcs.ITCS.2023.91},
  annote =	{Keywords: Optimal stopping, online algorithm}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2021.45

Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

Authors: Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan

Published in: LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)

Abstract

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy.

Cite as

Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan. Decision Tree Heuristics Can Fail, Even in the Smoothed Setting. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 45:1-45:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.APPROX/RANDOM.2021.45,
  author =	{Blanc, Guy and Lange, Jane and Qiao, Mingda and Tan, Li-Yang},
  title =	{{Decision Tree Heuristics Can Fail, Even in the Smoothed Setting}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)},
  pages =	{45:1--45:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-207-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{207},
  editor =	{Wootters, Mary and Sanit\`{a}, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.45},
  URN =		{urn:nbn:de:0030-drops-147386},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2021.45},
  annote =	{Keywords: decision trees, learning theory, smoothed analysis}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2018.47

Learning Discrete Distributions from Untrusted Batches

Authors: Mingda Qiao and Gregory Valiant

Published in: LIPIcs, Volume 94, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)

Abstract

We consider the problem of learning a discrete distribution in the presence of an epsilon fraction of malicious data sources. Specifically, we consider the setting where there is some underlying distribution, p, and each data source provides a batch of >= k samples, with the guarantee that at least a (1 - epsilon) fraction of the sources draw their samples from a distribution with total variation distance at most \eta from p. We make no assumptions on the data provided by the remaining epsilon fraction of sources--this data can even be chosen as an adversarial function of the (1 - epsilon) fraction of "good" batches. We provide two algorithms: one with runtime exponential in the support size, n, but polynomial in k, 1/epsilon and 1/eta that takes O((n + k)/epsilon^2) batches and recovers p to error O(eta + epsilon/sqrt(k)). This recovery accuracy is information theoretically optimal, to constant factors, even given an infinite number of data sources. Our second algorithm applies to the eta = 0 setting and also achieves an O(epsilon/sqrt(k)) recover guarantee, though it runs in poly((nk)^k) time. This second algorithm, which approximates a certain tensor via a rank-1 tensor minimizing l_1 distance, is surprising in light of the hardness of many low-rank tensor approximation problems, and may be of independent interest.

Cite as

Mingda Qiao and Gregory Valiant. Learning Discrete Distributions from Untrusted Batches. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 47:1-47:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{qiao_et_al:LIPIcs.ITCS.2018.47,
  author =	{Qiao, Mingda and Valiant, Gregory},
  title =	{{Learning Discrete Distributions from Untrusted Batches}},
  booktitle =	{9th Innovations in Theoretical Computer Science Conference (ITCS 2018)},
  pages =	{47:1--47:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-060-6},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{94},
  editor =	{Karlin, Anna R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2018.47},
  URN =		{urn:nbn:de:0030-drops-83215},
  doi =		{10.4230/LIPIcs.ITCS.2018.47},
  annote =	{Keywords: robust statistics, information-theoretic optimality}
}

8 Search Results for "Qiao, Mingda"

Limitations of Membership Queries in Testable Learning

Abstract

Cite as

On the Instance Optimality of Detecting Collisions and Subgraphs

Abstract

Cite as

Smooth Calibration and Decision Making

Abstract

Cite as

Combinatorial Pen Testing (Or Consumer Surplus of Deferred-Acceptance Auctions)

Abstract

Cite as

A Combinatorial Approach to Robust PCA

Abstract

Cite as

Online Pen Testing

Abstract

Cite as

Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

Abstract

Cite as

Learning Discrete Distributions from Untrusted Batches

Abstract

Cite as

Thanks for your feedback!

Could not send message