DROPS

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2024.53

When Do Low-Rate Concatenated Codes Approach The Gilbert-Varshamov Bound?

Authors: Dean Doron, Jonathan Mosheiff, and Mary Wootters

Published in: LIPIcs, Volume 317, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)

Abstract

The Gilbert-Varshamov (GV) bound is a classical existential result in coding theory. It implies that a random linear binary code of rate ε² has relative distance at least 1/2 - O(ε) with high probability. However, it is a major challenge to construct explicit codes with similar parameters. One hope to derandomize the Gilbert-Varshamov construction is with code concatenation: We begin with a (hopefully explicit) outer code 𝒞_out over a large alphabet, and concatenate that with a small binary random linear code 𝒞_in. It is known that when we use independent small codes for each coordinate, then the result lies on the GV bound with high probability, but this still uses a lot of randomness. In this paper, we consider the question of whether code concatenation with a single random linear inner code 𝒞_in can lie on the GV bound; and if so what conditions on 𝒞_out are sufficient for this. We show that first, there do exist linear outer codes 𝒞_out that are "good" for concatenation in this sense (in fact, most linear codes codes are good). We also provide two sufficient conditions for 𝒞_out, so that if 𝒞_out satisfies these, 𝒞_out∘𝒞_in will likely lie on the GV bound. We hope that these conditions may inspire future work towards constructing explicit codes 𝒞_out.

Cite as

Dean Doron, Jonathan Mosheiff, and Mary Wootters. When Do Low-Rate Concatenated Codes Approach The Gilbert-Varshamov Bound?. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 317, pp. 53:1-53:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{doron_et_al:LIPIcs.APPROX/RANDOM.2024.53,
  author =	{Doron, Dean and Mosheiff, Jonathan and Wootters, Mary},
  title =	{{When Do Low-Rate Concatenated Codes Approach The Gilbert-Varshamov Bound?}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2024)},
  pages =	{53:1--53:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-348-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{317},
  editor =	{Kumar, Amit and Ron-Zewi, Noga},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2024.53},
  URN =		{urn:nbn:de:0030-drops-210467},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2024.53},
  annote =	{Keywords: Error-correcting codes, Concatenated codes, Derandomization, Gilbert-Varshamov bound}
}

Document

DOI: 10.4230/LIPIcs.ECOOP.2024.39

Formalizing, Mechanizing, and Verifying Class-Based Refinement Types

Authors: Ke Sun, Di Wang, Sheng Chen, Meng Wang, and Dan Hao

Published in: LIPIcs, Volume 313, 38th European Conference on Object-Oriented Programming (ECOOP 2024)

Abstract

Refinement types have been extensively used in class-based languages to specify and verify fine-grained logical specifications. Despite the advances in practical aspects such as applicability and usability, two fundamental issues persist. First, the soundness of existing class-based refinement type systems is inadequately explored, casting doubts on their reliability. Second, the expressiveness of existing systems is limited, restricting the depiction of semantic properties related to object-oriented constructs. This work tackles these issues through a systematic framework. We formalize a declarative class-based refinement type calculus (named RFJ), that is expressive and concise. We rigorously develop the soundness meta-theory of this calculus, followed by its mechanization in Coq. Finally, to ensure the calculus’s verifiability, we propose an algorithmic verification approach based on a fragment of first-order logic (named LFJ), and implement this approach as a type checker.

Cite as

Ke Sun, Di Wang, Sheng Chen, Meng Wang, and Dan Hao. Formalizing, Mechanizing, and Verifying Class-Based Refinement Types. In 38th European Conference on Object-Oriented Programming (ECOOP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 313, pp. 39:1-39:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{sun_et_al:LIPIcs.ECOOP.2024.39,
  author =	{Sun, Ke and Wang, Di and Chen, Sheng and Wang, Meng and Hao, Dan},
  title =	{{Formalizing, Mechanizing, and Verifying Class-Based Refinement Types}},
  booktitle =	{38th European Conference on Object-Oriented Programming (ECOOP 2024)},
  pages =	{39:1--39:30},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-341-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{313},
  editor =	{Aldrich, Jonathan and Salvaneschi, Guido},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ECOOP.2024.39},
  URN =		{urn:nbn:de:0030-drops-208881},
  doi =		{10.4230/LIPIcs.ECOOP.2024.39},
  annote =	{Keywords: Refinement Types, Program Verification, Object-oriented Programming}
}

Document

DOI: 10.4230/LIPIcs.CCC.2024.16

A Strong Direct Sum Theorem for Distributional Query Complexity

Authors: Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan

Published in: LIPIcs, Volume 300, 39th Computational Complexity Conference (CCC 2024)

Abstract

Consider the expected query complexity of computing the k-fold direct product f^{⊗ k} of a function f to error ε with respect to a distribution μ^k. One strategy is to sequentially compute each of the k copies to error ε/k with respect to μ and apply the union bound. We prove a strong direct sum theorem showing that this naive strategy is essentially optimal. In particular, computing a direct product necessitates a blowup in both query complexity and error. Strong direct sum theorems contrast with results that only show a blowup in query complexity or error but not both. There has been a long line of such results for distributional query complexity, dating back to (Impagliazzo, Raz, Wigderson 1994) and (Nisan, Rudich, Saks 1994), but a strong direct sum theorem that holds for all functions in the standard query model had been elusive. A key idea in our work is the first use of the Hardcore Theorem (Impagliazzo 1995) in the context of query complexity. We prove a new resilience lemma that accompanies it, showing that the hardcore of f^{⊗k} is likely to remain dense under arbitrary partitions of the input space.

Cite as

Guy Blanc, Caleb Koch, Carmen Strassle, and Li-Yang Tan. A Strong Direct Sum Theorem for Distributional Query Complexity. In 39th Computational Complexity Conference (CCC 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 300, pp. 16:1-16:30, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.CCC.2024.16,
  author =	{Blanc, Guy and Koch, Caleb and Strassle, Carmen and Tan, Li-Yang},
  title =	{{A Strong Direct Sum Theorem for Distributional Query Complexity}},
  booktitle =	{39th Computational Complexity Conference (CCC 2024)},
  pages =	{16:1--16:30},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-331-7},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{300},
  editor =	{Santhanam, Rahul},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2024.16},
  URN =		{urn:nbn:de:0030-drops-204123},
  doi =		{10.4230/LIPIcs.CCC.2024.16},
  annote =	{Keywords: Query complexity, direct product theorem, hardcore theorem}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2023.18

Certification with an NP Oracle

Authors: Guy Blanc, Caleb Koch, Jane Lange, Carmen Strassle, and Li-Yang Tan

Published in: LIPIcs, Volume 251, 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)

Abstract

In the certification problem, the algorithm is given a function f with certificate complexity k and an input x^⋆, and the goal is to find a certificate of size ≤ poly(k) for f’s value at x^⋆. This problem is in NP^NP, and assuming 𝖯 ≠ NP, is not in 𝖯. Prior works, dating back to Valiant in 1984, have therefore sought to design efficient algorithms by imposing assumptions on f such as monotonicity. Our first result is a BPP^NP algorithm for the general problem. The key ingredient is a new notion of the balanced influence of variables, a natural variant of influence that corrects for the bias of the function. Balanced influences can be accurately estimated via uniform generation, and classic BPP^NP algorithms are known for the latter task. We then consider certification with stricter instance-wise guarantees: for each x^⋆, find a certificate whose size scales with that of the smallest certificate for x^⋆. In sharp contrast with our first result, we show that this problem is NP^NP-hard even to approximate. We obtain an optimal inapproximability ratio, adding to a small handful of problems in the higher levels of the polynomial hierarchy for which optimal inapproximability is known. Our proof involves the novel use of bit-fixing dispersers for gap amplification.

Cite as

Guy Blanc, Caleb Koch, Jane Lange, Carmen Strassle, and Li-Yang Tan. Certification with an NP Oracle. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 251, pp. 18:1-18:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.ITCS.2023.18,
  author =	{Blanc, Guy and Koch, Caleb and Lange, Jane and Strassle, Carmen and Tan, Li-Yang},
  title =	{{Certification with an NP Oracle}},
  booktitle =	{14th Innovations in Theoretical Computer Science Conference (ITCS 2023)},
  pages =	{18:1--18:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-263-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{251},
  editor =	{Tauman Kalai, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2023.18},
  URN =		{urn:nbn:de:0030-drops-175217},
  doi =		{10.4230/LIPIcs.ITCS.2023.18},
  annote =	{Keywords: Certificate complexity, Boolean functions, polynomial hierarchy, hardness of approximation}
}

Document

DOI: 10.4230/LIPIcs.CCC.2022.10

New Near-Linear Time Decodable Codes Closer to the GV Bound

Authors: Guy Blanc and Dean Doron

Published in: LIPIcs, Volume 234, 37th Computational Complexity Conference (CCC 2022)

Abstract

We construct a family of binary codes of relative distance 1/2-ε and rate ε² ⋅ 2^(-log^α (1/ε)) for α ≈ 1/2 that are decodable, probabilistically, in near-linear time. This improves upon the rate of the state-of-the-art near-linear time decoding near the GV bound due to Jeronimo, Srivastava, and Tulsiani, who gave a randomized decoding of Ta-Shma codes with α ≈ 5/6 [Ta-Shma, 2017; Jeronimo et al., 2021]. Each code in our family can be constructed in probabilistic polynomial time, or deterministic polynomial time given sufficiently good explicit 3-uniform hypergraphs. Our construction is based on a new graph-based bias amplification method. While previous works start with some base code of relative distance 1/2-ε₀ for ε₀ ≫ ε and amplify the distance to 1/2-ε by walking on an expander, or on a carefully tailored product of expanders, we walk over very sparse, highly mixing, hypergraphs. Study of such hypergraphs further offers an avenue toward achieving rate Ω̃(ε²). For our unique- and list-decoding algorithms, we employ the framework developed in [Jeronimo et al., 2021].

Cite as

Guy Blanc and Dean Doron. New Near-Linear Time Decodable Codes Closer to the GV Bound. In 37th Computational Complexity Conference (CCC 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 234, pp. 10:1-10:40, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.CCC.2022.10,
  author =	{Blanc, Guy and Doron, Dean},
  title =	{{New Near-Linear Time Decodable Codes Closer to the GV Bound}},
  booktitle =	{37th Computational Complexity Conference (CCC 2022)},
  pages =	{10:1--10:40},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-241-9},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{234},
  editor =	{Lovett, Shachar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2022.10},
  URN =		{urn:nbn:de:0030-drops-165726},
  doi =		{10.4230/LIPIcs.CCC.2022.10},
  annote =	{Keywords: Unique decoding, list decoding, the Gilbert-Varshamov bound, small-bias sample spaces, hypergraphs, expander walks}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2022.24

Reconstructing Decision Trees

Authors: Guy Blanc, Jane Lange, and Li-Yang Tan

Published in: LIPIcs, Volume 229, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)

Abstract

We give the first reconstruction algorithm for decision trees: given queries to a function f that is opt-close to a size-s decision tree, our algorithm provides query access to a decision tree T where: - T has size S := s^O((log s)²/ε³); - dist(f,T) ≤ O(opt)+ε; - Every query to T is answered with poly((log s)/ε)⋅ log n queries to f and in poly((log s)/ε)⋅ n log n time. This yields a tolerant tester that distinguishes functions that are close to size-s decision trees from those that are far from size-S decision trees. The polylogarithmic dependence on s in the efficiency of our tester is exponentially smaller than that of existing testers. Since decision tree complexity is well known to be related to numerous other boolean function properties, our results also provide a new algorithm for reconstructing and testing these properties.

Cite as

Guy Blanc, Jane Lange, and Li-Yang Tan. Reconstructing Decision Trees. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 229, pp. 24:1-24:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.ICALP.2022.24,
  author =	{Blanc, Guy and Lange, Jane and Tan, Li-Yang},
  title =	{{Reconstructing Decision Trees}},
  booktitle =	{49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)},
  pages =	{24:1--24:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-235-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{229},
  editor =	{Boja\'{n}czyk, Miko{\l}aj and Merelli, Emanuela and Woodruff, David P.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2022.24},
  URN =		{urn:nbn:de:0030-drops-163653},
  doi =		{10.4230/LIPIcs.ICALP.2022.24},
  annote =	{Keywords: Property reconstruction, property testing, tolerant testing, decision trees}
}

Document

DOI: 10.4230/LIPIcs.STACS.2022.17

On Testing Decision Tree

Authors: Nader H. Bshouty and Catherine A. Haddad-Zaknoon

Published in: LIPIcs, Volume 219, 39th International Symposium on Theoretical Aspects of Computer Science (STACS 2022)

Abstract

In this paper, we study testing decision tree of size and depth that are significantly smaller than the number of attributes n. Our main result addresses the problem of poly(n,1/ε) time algorithms with poly(s,1/ε) query complexity (independent of n) that distinguish between functions that are decision trees of size s from functions that are ε-far from any decision tree of size ϕ(s,1/ε), for some function ϕ > s. The best known result is the recent one that follows from Blanc, Lange and Tan, [Guy Blanc et al., 2020], that gives ϕ(s,1/ε) = 2^{O((log³s)/ε³)}. In this paper, we give a new algorithm that achieves ϕ(s,1/ε) = 2^{O(log² (s/ε))}. Moreover, we study the testability of depth-d decision tree and give a distribution free tester that distinguishes between depth-d decision tree and functions that are ε-far from depth-d² decision tree.

Cite as

Nader H. Bshouty and Catherine A. Haddad-Zaknoon. On Testing Decision Tree. In 39th International Symposium on Theoretical Aspects of Computer Science (STACS 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 219, pp. 17:1-17:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{bshouty_et_al:LIPIcs.STACS.2022.17,
  author =	{Bshouty, Nader H. and Haddad-Zaknoon, Catherine A.},
  title =	{{On Testing Decision Tree}},
  booktitle =	{39th International Symposium on Theoretical Aspects of Computer Science (STACS 2022)},
  pages =	{17:1--17:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-222-8},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{219},
  editor =	{Berenbrink, Petra and Monmege, Benjamin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2022.17},
  URN =		{urn:nbn:de:0030-drops-158273},
  doi =		{10.4230/LIPIcs.STACS.2022.17},
  annote =	{Keywords: Testing decision trees}
}

Document

RANDOM

DOI: 10.4230/LIPIcs.APPROX/RANDOM.2021.45

Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

Authors: Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan

Published in: LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)

Abstract

Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets f that are k-juntas, they showed that these heuristics successfully learn f with depth-k decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-k decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-k decision trees and show that even in the smoothed setting, these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to k-juntas, for which these heuristics build trees of depth 2^{Ω(k)} before achieving high accuracy.

Cite as

Guy Blanc, Jane Lange, Mingda Qiao, and Li-Yang Tan. Decision Tree Heuristics Can Fail, Even in the Smoothed Setting. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 45:1-45:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.APPROX/RANDOM.2021.45,
  author =	{Blanc, Guy and Lange, Jane and Qiao, Mingda and Tan, Li-Yang},
  title =	{{Decision Tree Heuristics Can Fail, Even in the Smoothed Setting}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)},
  pages =	{45:1--45:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-207-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{207},
  editor =	{Wootters, Mary and Sanit\`{a}, Laura},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.45},
  URN =		{urn:nbn:de:0030-drops-147386},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2021.45},
  annote =	{Keywords: decision trees, learning theory, smoothed analysis}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2021.30

Learning Stochastic Decision Trees

Authors: Guy Blanc, Jane Lange, and Li-Yang Tan

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Abstract

We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an η-corrupted set of uniform random samples labeled by a size-s stochastic decision tree, our algorithm runs in time n^{O(log(s/ε)/ε²)} and returns a hypothesis with error within an additive 2η + ε of the Bayes optimal. An additive 2η is the information-theoretic minimum. Previously no non-trivial algorithm with a guarantee of O(η) + ε was known, even for weaker noise models. Our algorithm is furthermore proper, returning a hypothesis that is itself a decision tree; previously no such algorithm was known even in the noiseless setting.

Cite as

Guy Blanc, Jane Lange, and Li-Yang Tan. Learning Stochastic Decision Trees. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 30:1-30:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.ICALP.2021.30,
  author =	{Blanc, Guy and Lange, Jane and Tan, Li-Yang},
  title =	{{Learning Stochastic Decision Trees}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{30:1--30:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.30},
  URN =		{urn:nbn:de:0030-drops-140994},
  doi =		{10.4230/LIPIcs.ICALP.2021.30},
  annote =	{Keywords: Learning theory, decision trees, proper learning algorithms, adversarial noise}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2020.44

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

Authors: Guy Blanc, Jane Lange, and Li-Yang Tan

Published in: LIPIcs, Volume 151, 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)

Abstract

Consider the following heuristic for building a decision tree for a function f : {0,1}^n → {± 1}. Place the most influential variable x_i of f at the root, and recurse on the subfunctions f_{x_i=0} and f_{x_i=1} on the left and right subtrees respectively; terminate once the tree is an ε-approximation of f. We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: - Upper bound: For every f with decision tree size s and every ε ∈ (0,1/2), this heuristic builds a decision tree of size at most s^O(log(s/ε)log(1/ε)). - Lower bound: For every ε ∈ (0,1/2) and s ≤ 2^Õ(√n), there is an f with decision tree size s such that this heuristic builds a decision tree of size s^Ω~(log s). We also obtain upper and lower bounds for monotone functions: s^O(√{log s}/ε) and s^Ω(∜{log s}) respectively. The lower bound disproves conjectures of Fiat and Pechyony (2004) and Lee (2009). Our upper bounds yield new algorithms for properly learning decision trees under the uniform distribution. We show that these algorithms - which are motivated by widely employed and empirically successful top-down decision tree learning heuristics such as ID3, C4.5, and CART - achieve provable guarantees that compare favorably with those of the current fastest algorithm (Ehrenfeucht and Haussler, 1989), and even have certain qualitative advantages. Our lower bounds shed new light on the limitations of these heuristics. Finally, we revisit the classic work of Ehrenfeucht and Haussler. We extend it to give the first uniform-distribution proper learning algorithm that achieves polynomial sample and memory complexity, while matching its state-of-the-art quasipolynomial runtime.

Cite as

Guy Blanc, Jane Lange, and Li-Yang Tan. Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations. In 11th Innovations in Theoretical Computer Science Conference (ITCS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 151, pp. 44:1-44:44, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{blanc_et_al:LIPIcs.ITCS.2020.44,
  author =	{Blanc, Guy and Lange, Jane and Tan, Li-Yang},
  title =	{{Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations}},
  booktitle =	{11th Innovations in Theoretical Computer Science Conference (ITCS 2020)},
  pages =	{44:1--44:44},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-134-4},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{151},
  editor =	{Vidick, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2020.44},
  URN =		{urn:nbn:de:0030-drops-117295},
  doi =		{10.4230/LIPIcs.ITCS.2020.44},
  annote =	{Keywords: Decision trees, Influence of variables, Analysis of boolean functions, Learning theory, Top-down decision tree heuristics}
}

10 Search Results for "Blanc, Guy"

When Do Low-Rate Concatenated Codes Approach The Gilbert-Varshamov Bound?

Abstract

Cite as

Formalizing, Mechanizing, and Verifying Class-Based Refinement Types

Abstract

Cite as

A Strong Direct Sum Theorem for Distributional Query Complexity

Abstract

Cite as

Certification with an NP Oracle

Abstract

Cite as

New Near-Linear Time Decodable Codes Closer to the GV Bound

Abstract

Cite as

Reconstructing Decision Trees

Abstract

Cite as

On Testing Decision Tree

Abstract

Cite as

Decision Tree Heuristics Can Fail, Even in the Smoothed Setting

Abstract

Cite as

Learning Stochastic Decision Trees

Abstract

Cite as

Top-Down Induction of Decision Trees: Rigorous Guarantees and Inherent Limitations

Abstract

Cite as

Thanks for your feedback!

Could not send message