2 Search Results for "Guy, Ido"

Document

DOI: 10.4230/LIPIcs.ICDT.2022.4

On the Hardness of Category Tree Construction

Authors: Shay Gershtein, Uri Avron, Ido Guy, Tova Milo, and Slava Novgorodov

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

Category trees, or taxonomies, are rooted trees where each node, called a category, corresponds to a set of related items. The construction of taxonomies has been studied in various domains, including e-commerce, document management, and question answering. Multiple algorithms for automating construction have been proposed, employing a variety of clustering approaches and crowdsourcing. However, no formal model to capture such categorization problems has been devised, and their complexity has not been studied. To address this, we propose in this work a combinatorial model that captures many practical settings and show that the aforementioned empirical approach has been warranted, as we prove strong inapproximability bounds for various problem variants and special cases when the goal is to produce a categorization of the maximum utility. In our model, the input is a set of n weighted item sets that the tree would ideally contain as categories. Each category, rather than perfectly match the corresponding input set, is allowed to exceed a given threshold for a given similarity function. The goal is to produce a tree that maximizes the total weight of the sets for which it contains a matching category. A key parameter is an upper bound on the number of categories an item may belong to, which produces the hardness of the problem, as initially each item may be contained in an arbitrary number of input sets. For this model, we prove inapproximability bounds, of order Θ̃(√n) or Θ̃(n), for various problem variants and special cases, loosely justifying the aforementioned heuristic approach. Our work includes reductions based on parameterized randomized constructions that highlight how various problem parameters and properties of the input may affect the hardness. Moreover, for the special case where the category must be identical to the corresponding input set, we devise an algorithm whose approximation guarantee depends solely on a more granular parameter, allowing improved worst-case guarantees. Finally, we also generalize our results to DAG-based and non-hierarchical categorization.

Cite as

Shay Gershtein, Uri Avron, Ido Guy, Tova Milo, and Slava Novgorodov. On the Hardness of Category Tree Construction. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{gershtein_et_al:LIPIcs.ICDT.2022.4,
  author =	{Gershtein, Shay and Avron, Uri and Guy, Ido and Milo, Tova and Novgorodov, Slava},
  title =	{{On the Hardness of Category Tree Construction}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{4:1--4:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.4},
  URN =		{urn:nbn:de:0030-drops-158785},
  doi =		{10.4230/LIPIcs.ICDT.2022.4},
  annote =	{Keywords: maximum independent set, approximation algorithms, approximation hardness bounds, taxonomy construction, category tree construction}
}

Document

DOI: 10.4230/LIPIcs.ITCS.2021.41

Interactive Proofs for Verifying Machine Learning

Authors: Shafi Goldwasser, Guy N. Rothblum, Jonathan Shafer, and Amir Yehudayoff

Published in: LIPIcs, Volume 185, 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)

Abstract

We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is "approximately correct"? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept good hypotheses, and reject bad hypotheses. Both the verifier and the prover are efficient and have access to labeled data samples from an unknown distribution. We are interested in cases where the verifier can use significantly less data than is required for (agnostic) PAC learning, or use a substantially cheaper data source (e.g., using only random samples for verification, even though learning requires membership queries). We believe that today, when data and data-driven algorithms are quickly gaining prominence, the question of verifying purported outcomes of data analyses is very well-motivated. We show three main results. First, we prove that for a specific hypothesis class, verification is significantly cheaper than learning in terms of sample complexity, even if the verifier engages with the prover only in a single-round (NP-like) protocol. Moreover, for this class we prove that single-round verification is also significantly cheaper than testing closeness to the class. Second, for the broad class of Fourier-sparse boolean functions, we show a multi-round (IP-like) verification protocol, where the prover uses membership queries, and the verifier is able to assess the result while only using random samples. Third, we show that verification is not always more efficient. Namely, we show a class of functions where verification requires as many samples as learning does, up to a logarithmic factor.

Cite as

Shafi Goldwasser, Guy N. Rothblum, Jonathan Shafer, and Amir Yehudayoff. Interactive Proofs for Verifying Machine Learning. In 12th Innovations in Theoretical Computer Science Conference (ITCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 185, pp. 41:1-41:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{goldwasser_et_al:LIPIcs.ITCS.2021.41,
  author =	{Goldwasser, Shafi and Rothblum, Guy N. and Shafer, Jonathan and Yehudayoff, Amir},
  title =	{{Interactive Proofs for Verifying Machine Learning}},
  booktitle =	{12th Innovations in Theoretical Computer Science Conference (ITCS 2021)},
  pages =	{41:1--41:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-177-1},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{185},
  editor =	{Lee, James R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2021.41},
  URN =		{urn:nbn:de:0030-drops-135806},
  doi =		{10.4230/LIPIcs.ITCS.2021.41},
  annote =	{Keywords: PAC learning, Fourier analysis of boolean functions, Complexity gaps, Complexity lower bounds, Goldreich-Levin algorithm, Kushilevitz-Mansour algorithm, Distribution testing}
}

Refine by Author
1 Avron, Uri
1 Gershtein, Shay
1 Goldwasser, Shafi
1 Guy, Ido
1 Milo, Tova
Show More...

Refine by Classification
1 Theory of computation → Approximation algorithms analysis
1 Theory of computation → Data structures and algorithms for data management
1 Theory of computation → Interactive proof systems
1 Theory of computation → Machine learning theory
1 Theory of computation → Problems, reductions and completeness

Refine by Keyword
1 Complexity gaps
1 Complexity lower bounds
1 Distribution testing
1 Fourier analysis of boolean functions
1 Goldreich-Levin algorithm
Show More...

Refine by Type
2 document

Refine by Publication Year
1 2021
1 2022

2 Search Results for "Guy, Ido"

On the Hardness of Category Tree Construction

Abstract

Cite as

Interactive Proofs for Verifying Machine Learning

Abstract

Cite as

Thanks for your feedback!

Could not send message