Document

**Published in:** LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

We study the problems of testing and learning high-dimensional discrete convex sets. The simplest high-dimensional discrete domain where convexity is a non-trivial property is the ternary hypercube, {-1,0,1}ⁿ. The goal of this work is to understand structural combinatorial properties of convex sets in this domain and to determine the complexity of the testing and learning problems. We obtain the following results.
Structural: We prove nearly tight bounds on the edge boundary of convex sets in {0,±1}ⁿ, showing that the maximum edge boundary of a convex set is Õ(n^{3/4})⋅3ⁿ, or equivalently that every convex set has influence Õ(n^{3/4}) and a convex set exists with influence Ω(n^{3/4}).
Learning and sample-based testing: We prove upper and lower bounds of 3^{Õ(n^{3/4})} and 3^{Ω(√n)} for the task of learning convex sets under the uniform distribution from random examples. The analysis of the learning algorithm relies on our upper bound on the influence. Both the upper and lower bound also hold for the problem of sample-based testing with two-sided error. For sample-based testing with one-sided error we show that the sample-complexity is 3^{Θ(n)}.
Testing with queries: We prove nearly matching upper and lower bounds of 3^{Θ̃(√n)} for one-sided error testing of convex sets with non-adaptive queries.

Hadley Black, Eric Blais, and Nathaniel Harms. Testing and Learning Convex Sets in the Ternary Hypercube. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 15:1-15:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{black_et_al:LIPIcs.ITCS.2024.15, author = {Black, Hadley and Blais, Eric and Harms, Nathaniel}, title = {{Testing and Learning Convex Sets in the Ternary Hypercube}}, booktitle = {15th Innovations in Theoretical Computer Science Conference (ITCS 2024)}, pages = {15:1--15:21}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-309-6}, ISSN = {1868-8969}, year = {2024}, volume = {287}, editor = {Guruswami, Venkatesan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.15}, URN = {urn:nbn:de:0030-drops-195435}, doi = {10.4230/LIPIcs.ITCS.2024.15}, annote = {Keywords: Property testing, learning theory, convex sets, testing convexity, fluctuation} }

Document

**Published in:** LIPIcs, Volume 287, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

We are interested in testing properties of distributions with systematically mislabeled samples. Our goal is to make decisions about unknown probability distributions, using a sample that has been collected by a confused collector, such as a machine-learning classifier that has not learned to distinguish all elements of the domain. The confused collector holds an unknown clustering of the domain and an input distribution μ, and provides two oracles: a sample oracle which produces a sample from μ that has been labeled according to the clustering; and a label-query oracle which returns the label of a query point x according to the clustering.
Our first set of results shows that identity, uniformity, and equivalence of distributions can be tested efficiently, under the earth-mover distance, with remarkably weak conditions on the confused collector, even when the unknown clustering is adversarial. This requires defining a variant of the distribution testing task (inspired by the recent testable learning framework of Rubinfeld & Vasilyan), where the algorithm should test a joint property of the distribution and its clustering. As an example, we get efficient testers when the distribution tester is allowed to reject if it detects that the confused collector clustering is "far" from being a decision tree.
The second set of results shows that we can sometimes do significantly better when the clustering is random instead of adversarial. For certain one-dimensional random clusterings, we show that uniformity can be tested under the TV distance using Õ((√n)/(ρ^{3/2} ε²)) samples and zero queries, where ρ ∈ (0,1] controls the "resolution" of the clustering. We improve this to O((√n)/(ρ ε²)) when queries are allowed.

Renato Ferreira Pinto Jr. and Nathaniel Harms. Distribution Testing with a Confused Collector. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 287, pp. 47:1-47:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{ferreirapintojr._et_al:LIPIcs.ITCS.2024.47, author = {Ferreira Pinto Jr., Renato and Harms, Nathaniel}, title = {{Distribution Testing with a Confused Collector}}, booktitle = {15th Innovations in Theoretical Computer Science Conference (ITCS 2024)}, pages = {47:1--47:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-309-6}, ISSN = {1868-8969}, year = {2024}, volume = {287}, editor = {Guruswami, Venkatesan}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2024.47}, URN = {urn:nbn:de:0030-drops-195755}, doi = {10.4230/LIPIcs.ITCS.2024.47}, annote = {Keywords: Distribution testing, property testing, uniformity testing, identity testing, earth-mover distance, sublinear algorithms} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 261, 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)

For any hereditary graph class ℱ, we construct optimal adjacency labeling schemes for the classes of subgraphs and induced subgraphs of Cartesian products of graphs in ℱ. As a consequence, we show that, if ℱ admits efficient adjacency labels (or, equivalently, small induced-universal graphs) meeting the information-theoretic minimum, then the classes of subgraphs and induced subgraphs of Cartesian products of graphs in ℱ do too. Our proof uses ideas from randomized communication complexity and hashing, and improves upon recent results of Chepoi, Labourel, and Ratel [Journal of Graph Theory, 2020].

Louis Esperet, Nathaniel Harms, and Viktor Zamaraev. Optimal Adjacency Labels for Subgraphs of Cartesian Products. In 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 261, pp. 57:1-57:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{esperet_et_al:LIPIcs.ICALP.2023.57, author = {Esperet, Louis and Harms, Nathaniel and Zamaraev, Viktor}, title = {{Optimal Adjacency Labels for Subgraphs of Cartesian Products}}, booktitle = {50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)}, pages = {57:1--57:11}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-278-5}, ISSN = {1868-8969}, year = {2023}, volume = {261}, editor = {Etessami, Kousha and Feige, Uriel and Puppis, Gabriele}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2023.57}, URN = {urn:nbn:de:0030-drops-181093}, doi = {10.4230/LIPIcs.ICALP.2023.57}, annote = {Keywords: Adjacency labeling schemes, Cartesian product, Hypercubes} }

Document

RANDOM

**Published in:** LIPIcs, Volume 245, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)

We study the problems of adjacency sketching, small-distance sketching, and approximate distance threshold (ADT) sketching for monotone classes of graphs. The algorithmic problem is to assign random sketches to the vertices of any graph G in the class, so that adjacency, exact distance thresholds, or approximate distance thresholds of two vertices u,v can be decided (with probability at least 2/3) from the sketches of u and v, by a decoder that does not know the graph. The goal is to determine when sketches of constant size exist.
Our main results are that, for monotone classes of graphs: constant-size adjacency sketches exist if and only if the class has bounded arboricity; constant-size small-distance sketches exist if and only if the class has bounded expansion; constant-size ADT sketches imply that the class has bounded expansion; any class of constant expansion (i.e. any proper minor closed class) has a constant-size ADT sketch; and a class may have arbitrarily small expansion without admitting a constant-size ADT sketch.

Louis Esperet, Nathaniel Harms, and Andrey Kupavskii. Sketching Distances in Monotone Graph Classes. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 18:1-18:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{esperet_et_al:LIPIcs.APPROX/RANDOM.2022.18, author = {Esperet, Louis and Harms, Nathaniel and Kupavskii, Andrey}, title = {{Sketching Distances in Monotone Graph Classes}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022)}, pages = {18:1--18:23}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-249-5}, ISSN = {1868-8969}, year = {2022}, volume = {245}, editor = {Chakrabarti, Amit and Swamy, Chaitanya}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2022.18}, URN = {urn:nbn:de:0030-drops-171406}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2022.18}, annote = {Keywords: adjacency labelling, informative labelling, distance sketching, adjacency sketching, communication complexity} }

Document

Track A: Algorithms, Complexity and Games

**Published in:** LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

We study distribution-free property testing and learning problems where the unknown probability distribution is a product distribution over ℝ^d. For many important classes of functions, such as intersections of halfspaces, polynomial threshold functions, convex sets, and k-alternating functions, the known algorithms either have complexity that depends on the support size of the distribution, or are proven to work only for specific examples of product distributions. We introduce a general method, which we call downsampling, that resolves these issues. Downsampling uses a notion of "rectilinear isoperimetry" for product distributions, which further strengthens the connection between isoperimetry, testing and learning. Using this technique, we attain new efficient distribution-free algorithms under product distributions on ℝ^d:
1) A simpler proof for non-adaptive, one-sided monotonicity testing of functions [n]^d → {0,1}, and improved sample complexity for testing monotonicity over unknown product distributions, from O(d⁷) [Black, Chakrabarty, & Seshadhri, SODA 2020] to O(d³).
2) Polynomial-time agnostic learning algorithms for functions of a constant number of halfspaces, and constant-degree polynomial threshold functions;
3) An exp{O(dlog(dk))}-time agnostic learning algorithm, and an exp{O(dlog(dk))}-sample tolerant tester, for functions of k convex sets; and a 2^O(d) sample-based one-sided tester for convex sets;
4) An exp{O(k√d)}-time agnostic learning algorithm for k-alternating functions, and a sample-based tolerant tester with the same complexity.

Nathaniel Harms and Yuichi Yoshida. Downsampling for Testing and Learning in Product Distributions. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 71:1-71:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{harms_et_al:LIPIcs.ICALP.2022.71, author = {Harms, Nathaniel and Yoshida, Yuichi}, title = {{Downsampling for Testing and Learning in Product Distributions}}, booktitle = {49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)}, pages = {71:1--71:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-235-8}, ISSN = {1868-8969}, year = {2022}, volume = {229}, editor = {Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.71}, URN = {urn:nbn:de:0030-drops-164123}, doi = {10.4230/LIPIcs.ICALP.2022.71}, annote = {Keywords: property testing, learning, monotonicity, halfspaces, intersections of halfspaces, polynomial threshold functions} }

Document

**Published in:** LIPIcs, Volume 151, 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)

We introduce a communication model called universal SMP, in which Alice and Bob receive a function f belonging to a family ℱ, and inputs x and y. Alice and Bob use shared randomness to send a message to a third party who cannot see f, x, y, or the shared randomness, and must decide f(x,y). Our main application of universal SMP is to relate communication complexity to graph labeling, where the goal is to give a short label to each vertex in a graph, so that adjacency or other functions of two vertices x and y can be determined from the labels ℓ(x), ℓ(y). We give a universal SMP protocol using O(k^2) bits of communication for deciding whether two vertices have distance at most k in distributive lattices (generalizing the k-Hamming Distance problem in communication complexity), and explain how this implies a O(k^2 log n) labeling scheme for deciding dist(x,y) ≤ k on distributive lattices with size n; in contrast, we show that a universal SMP protocol for determining dist(x,y) ≤ 2 in modular lattices (a superset of distributive lattices) has super-constant Ω(n^{1/4}) communication cost. On the other hand, we demonstrate that many graph families known to have efficient adjacency labeling schemes, such as trees, low-arboricity graphs, and planar graphs, admit constant-cost communication protocols for adjacency. Trees also have an O(k) protocol for deciding dist(x,y) ≤ k and planar graphs have an O(1) protocol for dist(x,y) ≤ 2, which implies a new O(log n) labeling scheme for the same problem on planar graphs.

Nathaniel Harms. Universal Communication, Universal Graphs, and Graph Labeling. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 151, pp. 33:1-33:27, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{harms:LIPIcs.ITCS.2020.33, author = {Harms, Nathaniel}, title = {{Universal Communication, Universal Graphs, and Graph Labeling}}, booktitle = {11th Innovations in Theoretical Computer Science Conference (ITCS 2020)}, pages = {33:1--33:27}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-134-4}, ISSN = {1868-8969}, year = {2020}, volume = {151}, editor = {Vidick, Thomas}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2020.33}, URN = {urn:nbn:de:0030-drops-117182}, doi = {10.4230/LIPIcs.ITCS.2020.33}, annote = {Keywords: Universal graphs, graph labeling, distance labeling, planar graphs, lattices, hamming distance} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail