Search Results

Documents authored by Yi, Ke


Document
Approximate Range Counting Under Differential Privacy

Authors: Ziyue Huang and Ke Yi

Published in: LIPIcs, Volume 189, 37th International Symposium on Computational Geometry (SoCG 2021)


Abstract
Range counting under differential privacy has been studied extensively. Unfortunately, lower bounds based on discrepancy theory suggest that large errors have to be introduced in order to preserve privacy: Essentially for any range space (except axis-parallel rectangles), the error has to be polynomial. In this paper, we show that by allowing a standard notion of geometric approximation where points near the boundary of the range may or may not be counted, the error can be reduced to logarithmic. Furthermore, our approximate range counting data structure can be used to solve the approximate nearest neighbor (ANN) problem and k-NN classification, leading to the first differentially private algorithms for these two problems with provable guarantees on the utility.

Cite as

Ziyue Huang and Ke Yi. Approximate Range Counting Under Differential Privacy. In 37th International Symposium on Computational Geometry (SoCG 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 189, pp. 45:1-45:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{huang_et_al:LIPIcs.SoCG.2021.45,
  author =	{Huang, Ziyue and Yi, Ke},
  title =	{{Approximate Range Counting Under Differential Privacy}},
  booktitle =	{37th International Symposium on Computational Geometry (SoCG 2021)},
  pages =	{45:1--45:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-184-9},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{189},
  editor =	{Buchin, Kevin and Colin de Verdi\`{e}re, \'{E}ric},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2021.45},
  URN =		{urn:nbn:de:0030-drops-138441},
  doi =		{10.4230/LIPIcs.SoCG.2021.45},
  annote =	{Keywords: Differential Privacy, Approximate Range Counting}
}
Document
Complete Volume
LIPIcs, Volume 186, ICDT 2021, Complete Volume

Authors: Ke Yi and Zhewei Wei

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
LIPIcs, Volume 186, ICDT 2021, Complete Volume

Cite as

Ke Yi and Zhewei Wei. LIPIcs, Volume 186, ICDT 2021, Complete Volume. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 1-438, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@Proceedings{yi_et_al:LIPIcs.ICDT.2021,
  title =	{{LIPIcs, Volume 186, ICDT 2021, Complete Volume}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{1--438},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021},
  URN =		{urn:nbn:de:0030-drops-137076},
  doi =		{10.4230/LIPIcs.ICDT.2021},
  annote =	{Keywords: LIPIcs, Volume 186, ICDT 2021, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Ke Yi and Zhewei Wei

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

Ke Yi and Zhewei Wei. Front Matter, Table of Contents, Preface, Conference Organization. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 0:i-0:xvi, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{yi_et_al:LIPIcs.ICDT.2021.0,
  author =	{Yi, Ke and Wei, Zhewei},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.0},
  URN =		{urn:nbn:de:0030-drops-137086},
  doi =		{10.4230/LIPIcs.ICDT.2021.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Invited Talk
Worst-Case Optimal Join Algorithms (Invited Talk)

Authors: Ke Yi

Published in: LIPIcs, Volume 181, 31st International Symposium on Algorithms and Computation (ISAAC 2020)


Abstract
Join is the most important operator in relational databases, and remains the most expensive one despite years of research and engineering efforts. Following the ground-breaking work of Atserias, Grohe, and Marx in 2008, worst-case optimal join algorithms have been discovered, which has led to not only a series of beautiful theoretical results, but also new database systems based on drastically different query evaluation techniques. In this talk, I will present an overview of this topic, including algorithms in various computation models (sequential and parallel), variants of the problem (full, Boolean, and counting), and approximate solutions.

Cite as

Ke Yi. Worst-Case Optimal Join Algorithms (Invited Talk). In 31st International Symposium on Algorithms and Computation (ISAAC 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 181, p. 2:1, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{yi:LIPIcs.ISAAC.2020.2,
  author =	{Yi, Ke},
  title =	{{Worst-Case Optimal Join Algorithms}},
  booktitle =	{31st International Symposium on Algorithms and Computation (ISAAC 2020)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-173-3},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{181},
  editor =	{Cao, Yixin and Cheng, Siu-Wing and Li, Minming},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2020.2},
  URN =		{urn:nbn:de:0030-drops-133462},
  doi =		{10.4230/LIPIcs.ISAAC.2020.2},
  annote =	{Keywords: query evaluation}
}
Document
Random Sampling and Size Estimation Over Cyclic Joins

Authors: Yu Chen and Ke Yi

Published in: LIPIcs, Volume 155, 23rd International Conference on Database Theory (ICDT 2020)


Abstract
Computing joins is expensive, and often unnecessary when the output size is large. In 1999, Chaudhuri et al. [Surajit Chaudhuri et al., 1999] posed the problem of random sampling over joins as a potentially effective approach to avoiding computing the join in full, while obtaining important statistical information about the join results. Unfortunately, no significant progress has been made in the last 20 years, except for the case of acyclic joins. In this paper, we present the first non-trivial result on sampling over cyclic joins. We show that after a linear-time preprocessing step, a join result can be drawn uniformly at random in expected time O(IN^ρ/OUT), where IN^ρ is known as the AGM bound of the join and OUT is its output size. This result holds for all joins on binary relations, as well as certain joins on relations of higher arity. We further show how this algorithm immediately leads to a join size estimation algorithm with the same running time.

Cite as

Yu Chen and Ke Yi. Random Sampling and Size Estimation Over Cyclic Joins. In 23rd International Conference on Database Theory (ICDT 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 155, pp. 7:1-7:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ICDT.2020.7,
  author =	{Chen, Yu and Yi, Ke},
  title =	{{Random Sampling and Size Estimation Over Cyclic Joins}},
  booktitle =	{23rd International Conference on Database Theory (ICDT 2020)},
  pages =	{7:1--7:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-139-9},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{155},
  editor =	{Lutz, Carsten and Jung, Jean Christoph},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2020.7},
  URN =		{urn:nbn:de:0030-drops-119315},
  doi =		{10.4230/LIPIcs.ICDT.2020.7},
  annote =	{Keywords: Random sampling, joins, join size estimation}
}
Document
Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151)

Authors: Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, and Ke Yi

Published in: Dagstuhl Manifestos, Volume 7, Issue 1 (2018)


Abstract
The area of Principles of Data Management (PDM) has made crucial contributions to the development of formal frameworks for understanding and managing data and knowledge. This work has involved a rich cross-fertilization between PDM and other disciplines in mathematics and computer science, including logic, complexity theory, and knowledge representation. We anticipate on-going expansion of PDM research as the technology and applications involving data management continue to grow and evolve. In particular, the lifecycle of Big Data Analytics raises a wealth of challenge areas that PDM can help with. In this report we identify some of the most important research directions where the PDM community has the potential to make significant contributions. This is done from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term.

Cite as

Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, and Ke Yi. Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151). In Dagstuhl Manifestos, Volume 7, Issue 1, pp. 1-29, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@Article{abiteboul_et_al:DagMan.7.1.1,
  author =	{Abiteboul, Serge and Arenas, Marcelo and Barcel\'{o}, Pablo and Bienvenu, Meghyn and Calvanese, Diego and David, Claire and Hull, Richard and H\"{u}llermeier, Eyke and Kimelfeld, Benny and Libkin, Leonid and Martens, Wim and Milo, Tova and Murlak, Filip and Neven, Frank and Ortiz, Magdalena and Schwentick, Thomas and Stoyanovich, Julia and Su, Jianwen and Suciu, Dan and Vianu, Victor and Yi, Ke},
  title =	{{Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151)}},
  pages =	{1--29},
  journal =	{Dagstuhl Manifestos},
  ISSN =	{2193-2433},
  year =	{2018},
  volume =	{7},
  number =	{1},
  editor =	{Abiteboul, Serge and Arenas, Marcelo and Barcel\'{o}, Pablo and Bienvenu, Meghyn and Calvanese, Diego and David, Claire and Hull, Richard and H\"{u}llermeier, Eyke and Kimelfeld, Benny and Libkin, Leonid and Martens, Wim and Milo, Tova and Murlak, Filip and Neven, Frank and Ortiz, Magdalena and Schwentick, Thomas and Stoyanovich, Julia and Su, Jianwen and Suciu, Dan and Vianu, Victor and Yi, Ke},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagMan.7.1.1},
  URN =		{urn:nbn:de:0030-drops-86772},
  doi =		{10.4230/DagMan.7.1.1},
  annote =	{Keywords: database theory, principles of data management, query languages, efficient query processing, query optimization, heterogeneous data, uncertainty, knowledge-enriched data management, machine learning, workflows, human-related data, ethics}
}
Document
Join Algorithms: From External Memory to the BSP

Authors: Ke Yi

Published in: LIPIcs, Volume 98, 21st International Conference on Database Theory (ICDT 2018)


Abstract
Database systems have been traditionally disk-based, which had motivated the extensive study on external memory (EM) algorithms. However, as RAMs continue to get larger and cheaper, modern distributed data systems are increasingly adopting a main memory based, shared-nothing architecture, exemplified by systems like Spark and Flink. These systems can be abstracted by the BSP model (with variants like the MPC model and the MapReduce model), and there has been a strong revived interest in designing BSP algorithms for handling large amounts of data. With hard disks starting to fade away from the picture, EM algorithms may now seem less relevant. However, we observe that many of the recently developed join algorithms under the BSP model have a high degree of resemblance with their counterparts in the EM model. In this talk, I will present some recent results on join algorithms in the EM and BSP model, examine their relationships, and discuss a general theoretical framework for converting EM algorithms to the BSP.

Cite as

Ke Yi. Join Algorithms: From External Memory to the BSP. In 21st International Conference on Database Theory (ICDT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 98, p. 2:1, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{yi:LIPIcs.ICDT.2018.2,
  author =	{Yi, Ke},
  title =	{{Join Algorithms: From External Memory to the BSP}},
  booktitle =	{21st International Conference on Database Theory (ICDT 2018)},
  pages =	{2:1--2:1},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-063-7},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{98},
  editor =	{Kimelfeld, Benny and Amsterdamer, Yael},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2018.2},
  URN =		{urn:nbn:de:0030-drops-86126},
  doi =		{10.4230/LIPIcs.ICDT.2018.2},
  annote =	{Keywords: External memory model, BSP, join algorithms}
}
Document
The Priority R-Tree: A Practically Efficient and Worst-Case-Optimal R-Tree

Authors: Lars Arge, Mark de Berg, Herman J. Haverkort, and Ke Yi

Published in: Dagstuhl Seminar Proceedings, Volume 4301, Cache-Oblivious and Cache-Aware Algorithms (2005)


Abstract
The query efficiency of a data structure that stores a set of objects, can normally be assessed by analysing the number of objects, pointers etc. looked at when answering a query. However, if the data structure is too big to fit in main memory, data may need to be fetched from disk. In that case, the query efficiency is easily dominated by moving the disk head to the correct locations, rather than by reading the data itself. To reduce the number of disk accesses, once can group the data into blocks, and strive to bound the number of different blocks accessed rather than the number of individual data objects read. An R-tree is a general-purpose data structur that stores a hierarchical grouping of geometric objects into blocks. Many heuristics have been designed to determine which objects should be grouped together, but none of these heuristics could give a guarantee on the resulting worst-case query time. We present the Priority R-tree, or PR-tree, which is the first R-tree variant that always answers a window query by accessing $O((N/B)^{1-1/d} + T/B)$ blocks, where $N$ is the number of $d$-dimensional objects stored, $B$ is the number of objects per block, and $T$ is the number of objects whose bounding boxes intersect the query window. This is provably asymptotically optimal. Experiments show that the PR-tree performs similar to the best known heuristics on real-life and relatively nicely distributed data, but outperforms them significantly on more extreme data.

Cite as

Lars Arge, Mark de Berg, Herman J. Haverkort, and Ke Yi. The Priority R-Tree: A Practically Efficient and Worst-Case-Optimal R-Tree. In Cache-Oblivious and Cache-Aware Algorithms. Dagstuhl Seminar Proceedings, Volume 4301, pp. 1-26, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2005)


Copy BibTex To Clipboard

@InProceedings{arge_et_al:DagSemProc.04301.3,
  author =	{Arge, Lars and de Berg, Mark and Haverkort, Herman J. and Yi, Ke},
  title =	{{The Priority R-Tree: A Practically Efficient and Worst-Case-Optimal R-Tree}},
  booktitle =	{Cache-Oblivious and Cache-Aware Algorithms},
  pages =	{1--26},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2005},
  volume =	{4301},
  editor =	{Lars Arge and Michael A. Bender and Erik Demaine and Charles Leiserson and Kurt Mehlhorn},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.04301.3},
  URN =		{urn:nbn:de:0030-drops-1554},
  doi =		{10.4230/DagSemProc.04301.3},
  annote =	{Keywords: R-Trees}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail