3 Search Results for "Yang, Jun"

Document

Invited Talk

DOI: 10.4230/LIPIcs.ICDT.2024.2

How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk)

Authors: Sudeepa Roy, Amir Gilad, Yihao Hu, Hanze Meng, Zhengjie Miao, Kristin Stephens-Martinez, and Jun Yang

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

Data analytics skills have become an indispensable part of any education that seeks to prepare its students for the modern workforce. Essential in this skill set is the ability to work with structured relational data. Relational queries are based on logic and may be declarative in nature, posing new challenges to novices and students. Manual teaching resources being limited and enrollment growing rapidly, automated tools that help students debug queries and explain errors are potential game-changers in database education. We present a suite of tools built on the foundations of database theory that has been used by over 1600 students in database classes at Duke University, showcasing a high-impact application of database theory in database education.

Cite as

Sudeepa Roy, Amir Gilad, Yihao Hu, Hanze Meng, Zhengjie Miao, Kristin Stephens-Martinez, and Jun Yang. How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk). In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 2:1-2:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{roy_et_al:LIPIcs.ICDT.2024.2,
  author =	{Roy, Sudeepa and Gilad, Amir and Hu, Yihao and Meng, Hanze and Miao, Zhengjie and Stephens-Martinez, Kristin and Yang, Jun},
  title =	{{How Database Theory Helps Teach Relational Queries in Database Education}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{2:1--2:9},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.2},
  URN =		{urn:nbn:de:0030-drops-197841},
  doi =		{10.4230/LIPIcs.ICDT.2024.2},
  annote =	{Keywords: Query Debugging, SQL, Relational Algebra, Relational Calculus, Database Education, Boolean Provenance}
}

Document

DOI: 10.4230/LIPIcs.ICDT.2024.18

Computing Data Distribution from Query Selectivities

Authors: Pankaj K. Agarwal, Rahul Raychaudhury, Stavros Sintos, and Jun Yang

Published in: LIPIcs, Volume 290, 27th International Conference on Database Theory (ICDT 2024)

Abstract

We are given a set 𝒵 = {(R_1,s_1), …, (R_n,s_n)}, where each R_i is a range in ℝ^d, such as rectangle or ball, and s_i ∈ [0,1] denotes its selectivity. The goal is to compute a small-size discrete data distribution 𝒟 = {(q₁,w₁),…, (q_m,w_m)}, where q_j ∈ ℝ^d and w_j ∈ [0,1] for each 1 ≤ j ≤ m, and ∑_{1≤j≤m} w_j = 1, such that 𝒟 is the most consistent with 𝒵, i.e., err_p(𝒟,𝒵) = 1/n ∑_{i = 1}ⁿ |s_i - ∑_{j=1}^m w_j⋅1(q_j ∈ R_i)|^p is minimized. In a database setting, 𝒵 corresponds to a workload of range queries over some table, together with their observed selectivities (i.e., fraction of tuples returned), and 𝒟 can be used as compact model for approximating the data distribution within the table without accessing the underlying contents. In this paper, we obtain both upper and lower bounds for this problem. In particular, we show that the problem of finding the best data distribution from selectivity queries is NP-complete. On the positive side, we describe a Monte Carlo algorithm that constructs, in time O((n+δ^{-d}) δ^{-2} polylog n), a discrete distribution 𝒟̃ of size O(δ^{-2}), such that err_p(𝒟̃,𝒵) ≤ min_𝒟 err_p(𝒟,𝒵)+δ (for p = 1,2,∞) where the minimum is taken over all discrete distributions. We also establish conditional lower bounds, which strongly indicate the infeasibility of relative approximations as well as removal of the exponential dependency on the dimension for additive approximations. This suggests that significant improvements to our algorithm are unlikely.

Cite as

Pankaj K. Agarwal, Rahul Raychaudhury, Stavros Sintos, and Jun Yang. Computing Data Distribution from Query Selectivities. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 18:1-18:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{agarwal_et_al:LIPIcs.ICDT.2024.18,
  author =	{Agarwal, Pankaj K. and Raychaudhury, Rahul and Sintos, Stavros and Yang, Jun},
  title =	{{Computing Data Distribution from Query Selectivities}},
  booktitle =	{27th International Conference on Database Theory (ICDT 2024)},
  pages =	{18:1--18:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-312-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{290},
  editor =	{Cormode, Graham and Shekelyan, Michael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2024.18},
  URN =		{urn:nbn:de:0030-drops-198007},
  doi =		{10.4230/LIPIcs.ICDT.2024.18},
  annote =	{Keywords: selectivity queries, discrete distributions, Multiplicative Weights Update, eps-approximation, learnable functions, depth problem, arrangement}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2021.11

Dynamic Enumeration of Similarity Joins

Authors: Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, and Jun Yang

Published in: LIPIcs, Volume 198, 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)

Abstract

This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of n points A,B in ℝ^d, a metric ϕ(⋅), and a distance threshold r > 0, report all pairs of points (a, b) ∈ A × B with ϕ(a,b) ≤ r. Our goal is to store A,B into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from A or B. We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for 𝓁₁, 𝓁_∞ metrics with log^{O(1)} n update time and delay. We show that such a data structure is not feasible for the 𝓁₂ metric for d ≥ 4. For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for 𝓁_p metric, with log^{O(1)} n delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH).

Cite as

Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, and Jun Yang. Dynamic Enumeration of Similarity Joins. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 11:1-11:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{agarwal_et_al:LIPIcs.ICALP.2021.11,
  author =	{Agarwal, Pankaj K. and Hu, Xiao and Sintos, Stavros and Yang, Jun},
  title =	{{Dynamic Enumeration of Similarity Joins}},
  booktitle =	{48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)},
  pages =	{11:1--11:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-195-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{198},
  editor =	{Bansal, Nikhil and Merelli, Emanuela and Worrell, James},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2021.11},
  URN =		{urn:nbn:de:0030-drops-140803},
  doi =		{10.4230/LIPIcs.ICALP.2021.11},
  annote =	{Keywords: dynamic enumeration, similarity joins, worst-case delay guarantee}
}

Refine by Author
3 Yang, Jun
2 Agarwal, Pankaj K.
2 Sintos, Stavros
1 Gilad, Amir
1 Hu, Xiao
Show More...

Refine by Classification
1 Information systems → Data management systems
1 Information systems → Structured Query Language
1 Theory of computation → Computational geometry
1 Theory of computation → Data structures and algorithms for data management
1 Theory of computation → Database theory

Refine by Keyword
1 Boolean Provenance
1 Database Education
1 Multiplicative Weights Update
1 Query Debugging
1 Relational Algebra
Show More...

Refine by Type
3 document

Refine by Publication Year
2 2024
1 2021

3 Search Results for "Yang, Jun"

How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk)

Abstract

Cite as

Computing Data Distribution from Query Selectivities

Abstract

Cite as

Dynamic Enumeration of Similarity Joins

Abstract

Cite as

Thanks for your feedback!

Could not send message