6 Search Results for "Ngo, Hung Q."


Document
Invited Talk
On an Information Theoretic Approach to Cardinality Estimation (Invited Talk)

Authors: Hung Q. Ngo

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)


Abstract
This article is a companion to an invited talk at ICDT'2022 with the same title. Cardinality estimation is among the most important problems in query optimization. It is well-documented that, when query plans go haywire, in most cases one can trace the root cause to the cardinality estimator being far off. In particular, traditional cardinality estimation based on selectivity estimation may sometimes under-estimate cardinalities by orders of magnitudes, because the independence or the uniformity assumptions do not typically hold. This talk outlines an approach to cardinality estimation that is "model-free" from a statistical stand-point. Being model-free means the approach tries to avoid making any distributional assumptions. Our approach is information-theoretic, and generalizes recent results on worst-case output size bounds of queries, allowing the estimator to take into account histogram information from the input relations. The estimator turns out to be the objective of a maximization problem subject to concave constraints, over an exponential number of variables. We then explain how the estimator can be computed in polynomial time for some fragment of these constraints. Overall, the talk introduces a new direction to address the classic problem of cardinality estimation that is designed to circumvent some of the pitfalls of selectivity-based estimation. We will also explain connections to other fundamental problems in information theory and database theory regarding information inequalities. The talk is based on (published and unpublished) joint works with Mahmoud Abo Khamis, Sungjin Im, Hossein Keshavarz, Phokion Kolaitis, Ben Moseley, XuanLong Nguyen, Kirk Pruhs, Dan Suciu, and Alireza Samadian Zakaria

Cite as

Hung Q. Ngo. On an Information Theoretic Approach to Cardinality Estimation (Invited Talk). In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 1:1-1:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{ngo:LIPIcs.ICDT.2022.1,
  author =	{Ngo, Hung Q.},
  title =	{{On an Information Theoretic Approach to Cardinality Estimation}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{1:1--1:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.1},
  URN =		{urn:nbn:de:0030-drops-158750},
  doi =		{10.4230/LIPIcs.ICDT.2022.1},
  annote =	{Keywords: Cardinality Estimation, Information Theory, Polymatroid Bound, Worst-case Optimal Join}
}
Document
Track B: Automata, Logic, Semantics, and Theory of Programming
Decision Problems in Information Theory

Authors: Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu

Published in: LIPIcs, Volume 168, 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)


Abstract
Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into levels of the arithmetical hierarchy. We establish the following results on checking the validity over all almost-entropic functions: first, validity of a Boolean information constraint arising from a monotone Boolean formula is co-recursively enumerable; second, validity of "tight" conditional information constraints is in Π⁰₃. Furthermore, under some restrictions, validity of conditional information constraints "with slack" is in Σ⁰₂, and validity of information inequality constraints involving max is Turing equivalent to validity of information inequality constraints (with no max involved). We also prove that the classical implication problem for conditional independence statements is co-recursively enumerable.

Cite as

Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu. Decision Problems in Information Theory. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 168, pp. 106:1-106:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{abokhamis_et_al:LIPIcs.ICALP.2020.106,
  author =	{Abo Khamis, Mahmoud and Kolaitis, Phokion G. and Ngo, Hung Q. and Suciu, Dan},
  title =	{{Decision Problems in Information Theory}},
  booktitle =	{47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)},
  pages =	{106:1--106:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-138-2},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{168},
  editor =	{Czumaj, Artur and Dawar, Anuj and Merelli, Emanuela},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2020.106},
  URN =		{urn:nbn:de:0030-drops-125137},
  doi =		{10.4230/LIPIcs.ICALP.2020.106},
  annote =	{Keywords: Information theory, decision problems, arithmetical hierarchy, entropic functions}
}
Document
Invited Talk
Learning Models over Relational Databases (Invited Talk)

Authors: Dan Olteanu

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
In this talk, I will make the case for a first-principles approach to machine learning over relational databases that exploits recent development in database systems and theory. The input to learning classification and regression models is defined by feature extraction queries over relational databases. The mainstream approach to learning over relational data is to materialize the training dataset, export it out of the database, and then learn over it using statistical software packages. These three steps are expensive and unnecessary. Instead, one can cast the machine learning problem as a database problem by decomposing the learning task into a batch of aggregates over the feature extraction query and by computing this batch over the input database. The performance of this database-centric approach benefits tremendously from structural properties of the relational data and of the feature extraction query; such properties may be algebraic (semi-ring), combinatorial (hypertree width), or statistical (sampling). It also benefits from database systems techniques such as factorized query evaluation and query compilation. For a variety of models, including factorization machines, decision trees, and support vector machines, this approach may come with lower computational complexity than the materialization of the training dataset used by the mainstream approach. Recent results show that this translates to several orders-of-magnitude speed-up over state-of-the-art systems such as TensorFlow, R, Scikit-learn, and mlpack. While these initial results are promising, there is much more awaiting to be discovered.

Cite as

Dan Olteanu. Learning Models over Relational Databases (Invited Talk). In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, p. 1:1, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{olteanu:LIPIcs.ICDT.2019.1,
  author =	{Olteanu, Dan},
  title =	{{Learning Models over Relational Databases}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{1:1--1:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.1},
  URN =		{urn:nbn:de:0030-drops-103034},
  doi =		{10.4230/LIPIcs.ICDT.2019.1},
  annote =	{Keywords: In-database analytics, Data complexity, Feature extraction queries, Database dependencies, Model reparameterization}
}
Document
Counting Triangles under Updates in Worst-Case Optimal Time

Authors: Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
We consider the problem of incrementally maintaining the triangle count query under single-tuple updates to the input relations. We introduce an approach that exhibits a space-time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size. This lowest update time is worst-case optimal conditioned on the Online Matrix-Vector Multiplication conjecture. The classical and factorized incremental view maintenance approaches are recovered as special cases of our approach within the space-time tradeoff. In particular, they require linear-time maintenance under updates, which is suboptimal. Our approach can also count all triangles in a static database in the worst-case optimal time needed for enumerating them.

Cite as

Ahmet Kara, Hung Q. Ngo, Milos Nikolic, Dan Olteanu, and Haozhe Zhang. Counting Triangles under Updates in Worst-Case Optimal Time. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kara_et_al:LIPIcs.ICDT.2019.4,
  author =	{Kara, Ahmet and Ngo, Hung Q. and Nikolic, Milos and Olteanu, Dan and Zhang, Haozhe},
  title =	{{Counting Triangles under Updates in Worst-Case Optimal Time}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.4},
  URN =		{urn:nbn:de:0030-drops-103068},
  doi =		{10.4230/LIPIcs.ICDT.2019.4},
  annote =	{Keywords: incremental view maintenance, amortized analysis, data skew}
}
Document
Boolean Tensor Decomposition for Conjunctive Queries with Negation

Authors: Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, and Dan Suciu

Published in: LIPIcs, Volume 127, 22nd International Conference on Database Theory (ICDT 2019)


Abstract
We propose an approach for answering conjunctive queries with negation, where the negated relations have bounded degree. Its data complexity matches that of the InsideOut and PANDA algorithms for the positive subquery of the input query and is expressed in terms of the fractional hypertree width and the submodular width respectively. Its query complexity depends on the structure of the conjunction of negated relations; in general it is exponential in the number of join variables occurring in negated relations yet it becomes polynomial for several classes of queries. This approach relies on several contributions. We show how to rewrite queries with negation on bounded-degree relations into equivalent conjunctive queries with not-all-equal (NAE) predicates, which are a multi-dimensional analog of disequality (!=). We then generalize the known color-coding technique to conjunctions of NAE predicates and explain it via a Boolean tensor decomposition of conjunctions of NAE predicates. This decomposition can be achieved via a probabilistic construction that can be derandomized efficiently.

Cite as

Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, and Dan Suciu. Boolean Tensor Decomposition for Conjunctive Queries with Negation. In 22nd International Conference on Database Theory (ICDT 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 127, pp. 21:1-21:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{abokhamis_et_al:LIPIcs.ICDT.2019.21,
  author =	{Abo Khamis, Mahmoud and Ngo, Hung Q. and Olteanu, Dan and Suciu, Dan},
  title =	{{Boolean Tensor Decomposition for Conjunctive Queries with Negation}},
  booktitle =	{22nd International Conference on Database Theory (ICDT 2019)},
  pages =	{21:1--21:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-101-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{127},
  editor =	{Barcelo, Pablo and Calautti, Marco},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2019.21},
  URN =		{urn:nbn:de:0030-drops-103236},
  doi =		{10.4230/LIPIcs.ICDT.2019.21},
  annote =	{Keywords: color-coding, combined complexity, negation, query evaluation}
}
Document
Efficiently Decodable Compressed Sensing by List-Recoverable Codes and Recursion

Authors: Hung Q. Ngo, Ely Porat, and Atri Rudra

Published in: LIPIcs, Volume 14, 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012)


Abstract
We present two recursive techniques to construct compressed sensing schemes that can be "decoded" in sub-linear time. The first technique is based on the well studied code composition method called code concatenation where the "outer" code has strong list recoverability properties. This technique uses only one level of recursion and critically uses the power of list recovery. The second recursive technique is conceptually similar, and has multiple recursion levels. The following compressed sensing results are obtained using these techniques: - Strongly explicit efficiently decodable l_1/l_1 compressed sensing matrices: We present a strongly explicit ("for all") compressed sensing measurement matrix with O(d^2log^2 n) measurements that can output near-optimal d-sparse approximations in time poly(d log n). - Near-optimal efficiently decodable l_1/l_1 compressed sensing matrices for non-negative signals: We present two randomized constructions of ("for all") compressed sensing matrices with near optimal number of measurements: O(d log n loglog_d n) and O_{m,s}(d^{1+1/s} log n (log^(m) n)^s), respectively, for any integer parameters s,m>=1. Both of these constructions can output near optimal d-sparse approximations for non-negative signals in time poly(d log n). To the best of our knowledge, none of the results are dominated by existing results in the literature.

Cite as

Hung Q. Ngo, Ely Porat, and Atri Rudra. Efficiently Decodable Compressed Sensing by List-Recoverable Codes and Recursion. In 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012). Leibniz International Proceedings in Informatics (LIPIcs), Volume 14, pp. 230-241, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)


Copy BibTex To Clipboard

@InProceedings{ngo_et_al:LIPIcs.STACS.2012.230,
  author =	{Ngo, Hung Q. and Porat, Ely and Rudra, Atri},
  title =	{{Efficiently Decodable Compressed Sensing by List-Recoverable Codes and Recursion}},
  booktitle =	{29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012)},
  pages =	{230--241},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-35-4},
  ISSN =	{1868-8969},
  year =	{2012},
  volume =	{14},
  editor =	{D\"{u}rr, Christoph and Wilke, Thomas},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2012.230},
  URN =		{urn:nbn:de:0030-drops-34011},
  doi =		{10.4230/LIPIcs.STACS.2012.230},
  annote =	{Keywords: Compressed Sensing, Sub-Linear Time Decoding, List-Recoverable Codes}
}
  • Refine by Author
  • 5 Ngo, Hung Q.
  • 3 Olteanu, Dan
  • 2 Abo Khamis, Mahmoud
  • 2 Suciu, Dan
  • 1 Kara, Ahmet
  • Show More...

  • Refine by Classification
  • 3 Theory of computation → Database query processing and optimization (theory)
  • 2 Information systems → Database query processing
  • 1 Computing methodologies → Machine learning approaches
  • 1 Computing methodologies → Supervised learning
  • 1 Information systems → Data streams
  • Show More...

  • Refine by Keyword
  • 1 Cardinality Estimation
  • 1 Compressed Sensing
  • 1 Data complexity
  • 1 Database dependencies
  • 1 Feature extraction queries
  • Show More...

  • Refine by Type
  • 6 document

  • Refine by Publication Year
  • 3 2019
  • 1 2012
  • 1 2020
  • 1 2022

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail