3 Search Results for "Samadian, Alireza"

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2022.1

On an Information Theoretic Approach to Cardinality Estimation (Invited Talk)

Authors: Hung Q. Ngo

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)

Abstract

This article is a companion to an invited talk at ICDT'2022 with the same title. Cardinality estimation is among the most important problems in query optimization. It is well-documented that, when query plans go haywire, in most cases one can trace the root cause to the cardinality estimator being far off. In particular, traditional cardinality estimation based on selectivity estimation may sometimes under-estimate cardinalities by orders of magnitudes, because the independence or the uniformity assumptions do not typically hold. This talk outlines an approach to cardinality estimation that is "model-free" from a statistical stand-point. Being model-free means the approach tries to avoid making any distributional assumptions. Our approach is information-theoretic, and generalizes recent results on worst-case output size bounds of queries, allowing the estimator to take into account histogram information from the input relations. The estimator turns out to be the objective of a maximization problem subject to concave constraints, over an exponential number of variables. We then explain how the estimator can be computed in polynomial time for some fragment of these constraints. Overall, the talk introduces a new direction to address the classic problem of cardinality estimation that is designed to circumvent some of the pitfalls of selectivity-based estimation. We will also explain connections to other fundamental problems in information theory and database theory regarding information inequalities. The talk is based on (published and unpublished) joint works with Mahmoud Abo Khamis, Sungjin Im, Hossein Keshavarz, Phokion Kolaitis, Ben Moseley, XuanLong Nguyen, Kirk Pruhs, Dan Suciu, and Alireza Samadian Zakaria

Cite as

Hung Q. Ngo. On an Information Theoretic Approach to Cardinality Estimation (Invited Talk). In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 1:1-1:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{ngo:LIPIcs.ICDT.2022.1,
  author =	{Ngo, Hung Q.},
  title =	{{On an Information Theoretic Approach to Cardinality Estimation}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{1:1--1:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.1},
  URN =		{urn:nbn:de:0030-drops-158750},
  doi =		{10.4230/LIPIcs.ICDT.2022.1},
  annote =	{Keywords: Cardinality Estimation, Information Theory, Polymatroid Bound, Worst-case Optimal Join}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2021.6

An Approximation Algorithm for the Matrix Tree Multiplication Problem

Authors: Mahmoud Abo-Khamis, Ryan Curtin, Sungjin Im, Benjamin Moseley, Hung Ngo, Kirk Pruhs, and Alireza Samadian

Published in: LIPIcs, Volume 202, 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)

Abstract

We consider the Matrix Tree Multiplication problem. This problem is a generalization of the classic Matrix Chain Multiplication problem covered in the dynamic programming chapter of many introductory algorithms textbooks. An instance of the Matrix Tree Multiplication problem consists of a rooted tree with a matrix associated with each edge. The output is, for each leaf in the tree, the product of the matrices on the chain/path from the root to that leaf. Matrix multiplications that are shared between various chains need only be computed once, potentially being shared between different root to leaf chains. Algorithms are evaluated by the number of scalar multiplications performed. Our main result is a linear time algorithm for which the number of scalar multiplications performed is at most 15 times the optimal number of scalar multiplications.

Cite as

Mahmoud Abo-Khamis, Ryan Curtin, Sungjin Im, Benjamin Moseley, Hung Ngo, Kirk Pruhs, and Alireza Samadian. An Approximation Algorithm for the Matrix Tree Multiplication Problem. In 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 202, pp. 6:1-6:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{abokhamis_et_al:LIPIcs.MFCS.2021.6,
  author =	{Abo-Khamis, Mahmoud and Curtin, Ryan and Im, Sungjin and Moseley, Benjamin and Ngo, Hung and Pruhs, Kirk and Samadian, Alireza},
  title =	{{An Approximation Algorithm for the Matrix Tree Multiplication Problem}},
  booktitle =	{46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)},
  pages =	{6:1--6:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-201-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{202},
  editor =	{Bonchi, Filippo and Puglisi, Simon J.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2021.6},
  URN =		{urn:nbn:de:0030-drops-144464},
  doi =		{10.4230/LIPIcs.MFCS.2021.6},
  annote =	{Keywords: Matrix Multiplication, Approximation Algorithm}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2021.97

Relational Algorithms for k-Means Clustering

Authors: Benjamin Moseley, Kirk Pruhs, Alireza Samadian, and Yuyan Wang

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Abstract

This paper gives a k-means approximation algorithm that is efficient in the relational algorithms model. This is an algorithm that operates directly on a relational database without performing a join to convert it to a matrix whose rows represent the data points. The running time is potentially exponentially smaller than N, the number of data points to be clustered that the relational database represents. Few relational algorithms are known and this paper offers techniques for designing relational algorithms as well as characterizing their limitations. We show that given two data points as cluster centers, if we cluster points according to their closest centers, it is NP-Hard to approximate the number of points in the clusters on a general relational input. This is trivial for conventional data inputs and this result exemplifies that standard algorithmic techniques may not be directly applied when designing an efficient relational algorithm. This paper then introduces a new method that leverages rejection sampling and the k-means++ algorithm to construct a O(1)-approximate k-means solution.

Cite as

Benjamin Moseley, Kirk Pruhs, Alireza Samadian, and Yuyan Wang. Relational Algorithms for k-Means Clustering. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 97:1-97:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{moseley_et_al:LIPIcs.ICALP.2021.97,
  author =	{Moseley, Benjamin and Pruhs, Kirk and Samadian, Alireza and Wang, Yuyan},
  title =	{{Relational Algorithms for k-Means Clustering}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{97:1--97:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.97},
  URN =		{urn:nbn:de:0030-drops-141668},
  doi =		{10.4230/LIPIcs.ICALP.2021.97},
  annote =	{Keywords: k-means, clustering, approximation, big-data, databases}
}

Refine by Author
2 Moseley, Benjamin
2 Pruhs, Kirk
2 Samadian, Alireza
1 Abo-Khamis, Mahmoud
1 Curtin, Ryan
Show More...

Refine by Classification
1 Information systems → Join algorithms
1 Information systems → Query optimization
1 Information systems → Query planning
1 Theory of computation → Approximation algorithms analysis
1 Theory of computation → Design and analysis of algorithms

Refine by Keyword
1 Approximation Algorithm
1 Cardinality Estimation
1 Information Theory
1 Matrix Multiplication
1 Polymatroid Bound
Show More...

Refine by Type
3 document

Refine by Publication Year
2 2021
1 2022

3 Search Results for "Samadian, Alireza"

On an Information Theoretic Approach to Cardinality Estimation (Invited Talk)

Abstract

Cite as

An Approximation Algorithm for the Matrix Tree Multiplication Problem

Abstract

Cite as

Relational Algorithms for k-Means Clustering

Abstract

Cite as

Thanks for your feedback!

Could not send message