10 Search Results for "Hu, Wei"


Document
Enumeration Algorithms for Conjunctive Queries with Projection

Authors: Shaleen Deep, Xiao Hu, and Paraschos Koutris

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
We investigate the enumeration of query results for an important subset of CQs with projections, namely star and path queries. The task is to design data structures and algorithms that allow for efficient enumeration with delay guarantees after a preprocessing phase. Our main contribution is a series of results based on the idea of interleaving precomputed output with further join processing to maintain delay guarantees, which maybe of independent interest. In particular, we design combinatorial algorithms that provide instance-specific delay guarantees in linear preprocessing time. These algorithms improve upon the currently best known results. Further, we show how existing results can be improved upon by using fast matrix multiplication. We also present {new} results involving tradeoff between preprocessing time and delay guarantees for enumeration of path queries that contain projections. CQs with projection where the join attribute is projected away is equivalent to boolean matrix multiplication. Our results can therefore be also interpreted as sparse, output-sensitive matrix multiplication with delay guarantees.

Cite as

Shaleen Deep, Xiao Hu, and Paraschos Koutris. Enumeration Algorithms for Conjunctive Queries with Projection. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 14:1-14:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{deep_et_al:LIPIcs.ICDT.2021.14,
  author =	{Deep, Shaleen and Hu, Xiao and Koutris, Paraschos},
  title =	{{Enumeration Algorithms for Conjunctive Queries with Projection}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{14:1--14:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.14},
  URN =		{urn:nbn:de:0030-drops-137229},
  doi =		{10.4230/LIPIcs.ICDT.2021.14},
  annote =	{Keywords: Query result enumeration, joins, ranking}
}
Document
Optimality of Linear Sketching Under Modular Updates

Authors: Kaave Hosseini, Shachar Lovett, and Grigory Yaroslavtsev

Published in: LIPIcs, Volume 137, 34th Computational Complexity Conference (CCC 2019)


Abstract
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in n dimensions, the existence of efficient streaming algorithms which can process Omega(n^2) updates implies efficient linear sketching algorithms with comparable cost. This improves upon the previous work of Li, Nguyen and Woodruff [Yi Li et al., 2014] and Ai, Hu, Li and Woodruff [Yuqing Ai et al., 2016] which required a triple-exponential number of updates to achieve a similar result for updates over integers. We extend our results to updates modulo p for integers p >= 2, and to approximation instead of exact computation.

Cite as

Kaave Hosseini, Shachar Lovett, and Grigory Yaroslavtsev. Optimality of Linear Sketching Under Modular Updates. In 34th Computational Complexity Conference (CCC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 137, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{hosseini_et_al:LIPIcs.CCC.2019.13,
  author =	{Hosseini, Kaave and Lovett, Shachar and Yaroslavtsev, Grigory},
  title =	{{Optimality of Linear Sketching Under Modular Updates}},
  booktitle =	{34th Computational Complexity Conference (CCC 2019)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-116-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{137},
  editor =	{Shpilka, Amir},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2019.13},
  URN =		{urn:nbn:de:0030-drops-108355},
  doi =		{10.4230/LIPIcs.CCC.2019.13},
  annote =	{Keywords: communication complexity, linear sketching, streaming algorithm}
}
Document
A Golden Age of Hardware Description Languages: Applying Programming Language Techniques to Improve Design Productivity

Authors: Lenny Truong and Pat Hanrahan

Published in: LIPIcs, Volume 136, 3rd Summit on Advances in Programming Languages (SNAPL 2019)


Abstract
Leading experts have declared that there is an impending golden age of computer architecture. During this age, the rate at which architects will be able to innovate will be directly tied to the design and implementation of the hardware description languages they use. Thus, the programming languages community stands on the critical path to this new golden age. This implies that we are also on the cusp of a golden age of hardware description languages. In this paper, we discuss the intellectual challenges facing researchers interested in hardware description language design, compilers, and formal methods. The major theme will be identifying opportunities to apply programming language techniques to address issues in hardware design productivity. Then, we present a vision for a multi-language system that provides a framework for developing solutions to these intellectual problems. This vision is based on a meta-programmed host language combined with a core embedded hardware description language that is used as the basis for the research and development of a sea of domain-specific languages. Central to the design of this system is the core language which is based on an abstraction that provides a general mechanism for the composition of hardware components described in any language.

Cite as

Lenny Truong and Pat Hanrahan. A Golden Age of Hardware Description Languages: Applying Programming Language Techniques to Improve Design Productivity. In 3rd Summit on Advances in Programming Languages (SNAPL 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 136, pp. 7:1-7:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{truong_et_al:LIPIcs.SNAPL.2019.7,
  author =	{Truong, Lenny and Hanrahan, Pat},
  title =	{{A Golden Age of Hardware Description Languages: Applying Programming Language Techniques to Improve Design Productivity}},
  booktitle =	{3rd Summit on Advances in Programming Languages (SNAPL 2019)},
  pages =	{7:1--7:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-113-9},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{136},
  editor =	{Lerner, Benjamin S. and Bod{\'\i}k, Rastislav and Krishnamurthi, Shriram},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SNAPL.2019.7},
  URN =		{urn:nbn:de:0030-drops-105508},
  doi =		{10.4230/LIPIcs.SNAPL.2019.7},
  annote =	{Keywords: Hardware Description Languages}
}
Document
Overparameterization: A Connection Between Software 1.0 and Software 2.0

Authors: Michael Carbin

Published in: LIPIcs, Volume 136, 3rd Summit on Advances in Programming Languages (SNAPL 2019)


Abstract
A new ecosystem of machine-learning driven applications, titled Software 2.0, has arisen that integrates neural networks into a variety of computational tasks. Such applications include image recognition, natural language processing, and other traditional machine learning tasks. However, these techniques have also grown to include other structured domains, such as program analysis and program optimization for which novel, domain-specific insights mate with model design. In this paper, we connect the world of Software 2.0 with that of traditional software - Software 1.0 - through overparameterization: a program may provide more computational capacity and precision than is necessary for the task at hand. In Software 2.0, overparamterization - when a machine learning model has more parameters than datapoints in the dataset - arises as a contemporary understanding of the ability for modern, gradient-based learning methods to learn models over complex datasets with high-accuracy. Specifically, the more parameters a model has, the better it learns. In Software 1.0, the results of the approximate computing community show that traditional software is also overparameterized in that software often simply computes results that are more precise than is required by the user. Approximate computing exploits this overparameterization to improve performance by eliminating unnecessary, excess computation. For example, one - of many techniques - is to reduce the precision of arithmetic in the application. In this paper, we argue that the gap between available precision and that that is required for either Software 1.0 or Software 2.0 is a fundamental aspect of software design that illustrates the balance between software designed for general-purposes and domain-adapted solutions. A general-purpose solution is easier to develop and maintain versus a domain-adapted solution. However, that ease comes at the expense of performance. We show that the approximate computing community and the machine learning community have developed overlapping techniques to improve performance by reducing overparameterization. We also show that because of these shared techniques, questions, concerns, and answers on how to construct software can translate from one software variant to the other.

Cite as

Michael Carbin. Overparameterization: A Connection Between Software 1.0 and Software 2.0. In 3rd Summit on Advances in Programming Languages (SNAPL 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 136, pp. 1:1-1:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{carbin:LIPIcs.SNAPL.2019.1,
  author =	{Carbin, Michael},
  title =	{{Overparameterization: A Connection Between Software 1.0 and Software 2.0}},
  booktitle =	{3rd Summit on Advances in Programming Languages (SNAPL 2019)},
  pages =	{1:1--1:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-113-9},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{136},
  editor =	{Lerner, Benjamin S. and Bod{\'\i}k, Rastislav and Krishnamurthi, Shriram},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SNAPL.2019.1},
  URN =		{urn:nbn:de:0030-drops-105440},
  doi =		{10.4230/LIPIcs.SNAPL.2019.1},
  annote =	{Keywords: Approximate Computing, Machine Learning, Software 2.0}
}
Document
Independent Range Sampling, Revisited Again

Authors: Peyman Afshani and Jeff M. Phillips

Published in: LIPIcs, Volume 129, 35th International Symposium on Computational Geometry (SoCG 2019)


Abstract
We revisit the range sampling problem: the input is a set of points where each point is associated with a real-valued weight. The goal is to store them in a structure such that given a query range and an integer k, we can extract k independent random samples from the points inside the query range, where the probability of sampling a point is proportional to its weight. This line of work was initiated in 2014 by Hu, Qiao, and Tao and it was later followed up by Afshani and Wei. The first line of work mostly studied unweighted but dynamic version of the problem in one dimension whereas the second result considered the static weighted problem in one dimension as well as the unweighted problem in 3D for halfspace queries. We offer three main results and some interesting insights that were missed by the previous work: We show that it is possible to build efficient data structures for range sampling queries if we allow the query time to hold in expectation (the first result), or obtain efficient worst-case query bounds by allowing the sampling probability to be approximately proportional to the weight (the second result). The third result is a conditional lower bound that shows essentially one of the previous two concessions is needed. For instance, for the 3D range sampling queries, the first two results give efficient data structures with near-linear space and polylogarithmic query time whereas the lower bound shows with near-linear space the worst-case query time must be close to n^{2/3}, ignoring polylogarithmic factors. Up to our knowledge, this is the first such major gap between the expected and worst-case query time of a range searching problem.

Cite as

Peyman Afshani and Jeff M. Phillips. Independent Range Sampling, Revisited Again. In 35th International Symposium on Computational Geometry (SoCG 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 129, pp. 4:1-4:13, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{afshani_et_al:LIPIcs.SoCG.2019.4,
  author =	{Afshani, Peyman and Phillips, Jeff M.},
  title =	{{Independent Range Sampling, Revisited Again}},
  booktitle =	{35th International Symposium on Computational Geometry (SoCG 2019)},
  pages =	{4:1--4:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-104-7},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{129},
  editor =	{Barequet, Gill and Wang, Yusu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2019.4},
  URN =		{urn:nbn:de:0030-drops-104088},
  doi =		{10.4230/LIPIcs.SoCG.2019.4},
  annote =	{Keywords: Range Searching, Data Structures, Sampling}
}
Document
Independent Range Sampling, Revisited

Authors: Peyman Afshani and Zhewei Wei

Published in: LIPIcs, Volume 87, 25th Annual European Symposium on Algorithms (ESA 2017)


Abstract
In the independent range sampling (IRS) problem, given an input set P of n points in R^d, the task is to build a data structure, such that given a range R and an integer t >= 1, it returns t points that are uniformly and independently drawn from P cap R. The samples must satisfy inter-query independence, that is, the samples returned by every query must be independent of the samples returned by all the previous queries. This problem was first tackled by Hu, Qiao and Tao in 2014, who proposed optimal structures for one-dimensional dynamic IRS problem in internal memory and one-dimensional static IRS problem in external memory. In this paper, we study two natural extensions of the independent range sampling problem. In the first extension, we consider the static IRS problem in two and three dimensions in internal memory. We obtain data structures with optimal space-query tradeoffs for 3D halfspace, 3D dominance, and 2D three-sided queries. The second extension considers weighted IRS problem. Each point is associated with a real-valued weight, and given a query range R, a sample is drawn independently such that each point in P cap R is selected with probability proportional to its weight. Walker's alias method is a classic solution to this problem when no query range is specified. We obtain optimal data structure for one dimensional weighted range sampling problem, thereby extending the alias method to allow range queries.

Cite as

Peyman Afshani and Zhewei Wei. Independent Range Sampling, Revisited. In 25th Annual European Symposium on Algorithms (ESA 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 87, pp. 3:1-3:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{afshani_et_al:LIPIcs.ESA.2017.3,
  author =	{Afshani, Peyman and Wei, Zhewei},
  title =	{{Independent Range Sampling, Revisited}},
  booktitle =	{25th Annual European Symposium on Algorithms (ESA 2017)},
  pages =	{3:1--3:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-049-1},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{87},
  editor =	{Pruhs, Kirk and Sohler, Christian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2017.3},
  URN =		{urn:nbn:de:0030-drops-78592},
  doi =		{10.4230/LIPIcs.ESA.2017.3},
  annote =	{Keywords: data structures, range searching, range sampling, random sampling}
}
Document
Modeling Power Consumption and Temperature in TLM Models

Authors: Matthieu Moy, Claude Helmstetter, Tayeb Bouhadiba, and Florence Maraninchi

Published in: LITES, Volume 3, Issue 1 (2016). Leibniz Transactions on Embedded Systems, Volume 3, Issue 1


Abstract
Many techniques and tools exist to estimate the power consumption and the temperature map of a chip. These tools help the hardware designers develop power efficient chips in the presence of temperature constraints. For this task, the application can be ignored or at least abstracted by some high level scenarios; at this stage, the actual embedded software is generally not available yet.However, after the hardware is defined, the embedded software can still have a significant influence on the power consumption; i.e., two implementations of the same application can consume more or less power. Moreover, the actual software power manager ensuring the temperature constraints, usually by acting dynamically on the voltage and frequency, must itself be validated. Validating such power management policy requires a model of both actuators and sensors, hence a closed-loop simulation of the functional model with a non-functional one.In this paper, we present and compare several tools to simulate the power and thermal behavior of a chip together with its functionality. We explore several levels of abstraction and study the impact on the precision of the analysis.

Cite as

Matthieu Moy, Claude Helmstetter, Tayeb Bouhadiba, and Florence Maraninchi. Modeling Power Consumption and Temperature in TLM Models. In LITES, Volume 3, Issue 1 (2016). Leibniz Transactions on Embedded Systems, Volume 3, Issue 1, pp. 03:1-03:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@Article{moy_et_al:LITES-v003-i001-a003,
  author =	{Moy, Matthieu and Helmstetter, Claude and Bouhadiba, Tayeb and Maraninchi, Florence},
  title =	{{Modeling Power Consumption and Temperature in TLM Models}},
  journal =	{Leibniz Transactions on Embedded Systems},
  pages =	{03:1--03:29},
  ISSN =	{2199-2002},
  year =	{2016},
  volume =	{3},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LITES-v003-i001-a003},
  doi =		{10.4230/LITES-v003-i001-a003},
  annote =	{Keywords: Power consumption, Temperature control, Virtual prototype, SystemC, Transactional modeling}
}
Document
New Characterizations in Turnstile Streams with Applications

Authors: Yuqing Ai, Wei Hu, Yi Li, and David P. Woodruff

Published in: LIPIcs, Volume 50, 31st Conference on Computational Complexity (CCC 2016)


Abstract
Recently, [Li, Nguyen, Woodruff, STOC 2014] showed any 1-pass constant probability streaming algorithm for computing a relation f on a vector x in {-m, -(m-1), ..., m}^n presented in the turnstile data stream model can be implemented by maintaining a linear sketch Ax mod q, where A is an r times n integer matrix and q = (q_1, ..., q_r) is a vector of positive integers. The space complexity of maintaining Ax mod q, not including the random bits used for sampling A and q, matches the space of the optimal algorithm. We give multiple strengthenings of this reduction, together with new applications. In particular, we show how to remove the following shortcomings of their reduction: 1. The Box Constraint. Their reduction applies only to algorithms that must be correct even if x_{infinity} = max_{i in [n]} |x_i| is allowed to be much larger than m at intermediate points in the stream, provided that x is in {-m, -(m-1), ..., m}^n at the end of the stream. We give a condition under which the optimal algorithm is a linear sketch even if it works only when promised that x is in {-m, -(m-1), ..., m}^n at all points in the stream. Using this, we show the first super-constant Omega(log m) bits lower bound for the problem of maintaining a counter up to an additive epsilon*m error in a turnstile stream, where epsilon is any constant in (0, 1/2). Previous lower bounds are based on communication complexity and are only for relative error approximation; interestingly, we do not know how to prove our result using communication complexity. More generally, we show the first super-constant Omega(log(m)) lower bound for additive approximation of l_p-norms; this bound is tight for p in [1, 2]. 2. Negative Coordinates. Their reduction allows x_i to be negative while processing the stream. We show an equivalence between 1-pass algorithms and linear sketches Ax mod q in dynamic graph streams, or more generally, the strict turnstile model, in which for all i in [n], x_i is nonnegative at all points in the stream. Combined with [Assadi, Khanna, Li, Yaroslavtsev, SODA 2016], this resolves the 1-pass space complexity of approximating the maximum matching in a dynamic graph stream, answering a question in that work. 3. 1-Pass Restriction. Their reduction only applies to 1-pass data stream algorithms in the turnstile model, while there exist algorithms for heavy hitters and for low rank approximation which provably do better with multiple passes. We extend the reduction to algorithms which make any number of passes, showing the optimal algorithm is to choose a new linear sketch at the beginning of each pass, based on the output of previous passes.

Cite as

Yuqing Ai, Wei Hu, Yi Li, and David P. Woodruff. New Characterizations in Turnstile Streams with Applications. In 31st Conference on Computational Complexity (CCC 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 50, pp. 20:1-20:22, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{ai_et_al:LIPIcs.CCC.2016.20,
  author =	{Ai, Yuqing and Hu, Wei and Li, Yi and Woodruff, David P.},
  title =	{{New Characterizations in Turnstile Streams with Applications}},
  booktitle =	{31st Conference on Computational Complexity (CCC 2016)},
  pages =	{20:1--20:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-008-8},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{50},
  editor =	{Raz, Ran},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CCC.2016.20},
  URN =		{urn:nbn:de:0030-drops-58337},
  doi =		{10.4230/LIPIcs.CCC.2016.20},
  annote =	{Keywords: communication complexity, data streams, dynamic graph streams, norm estimation}
}
Document
On The I/O Complexity of Dynamic Distinct Counting

Authors: Xiaocheng Hu, Yufei Tao, Yi Yang, Shengyu Zhang, and Shuigeng Zhou

Published in: LIPIcs, Volume 31, 18th International Conference on Database Theory (ICDT 2015)


Abstract
In dynamic distinct counting, we want to maintain a multi-set S of integers under insertions to answer efficiently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintains $S$ in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one stores S in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O(N/B log_{M/B} (N/B)) I/Os by sorting, where N=|S|, B is the block size, and M is the memory size. In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using N^{O(1)} blocks, if its expected amortized insertion cost is o(1/log B}), then it must incur Omega(N/(B log B)) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buffering: the query cost jumps from 0 dramatically to almost linearity as soon as the insertion cost drops slightly below Omega(1).

Cite as

Xiaocheng Hu, Yufei Tao, Yi Yang, Shengyu Zhang, and Shuigeng Zhou. On The I/O Complexity of Dynamic Distinct Counting. In 18th International Conference on Database Theory (ICDT 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 31, pp. 265-276, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)


Copy BibTex To Clipboard

@InProceedings{hu_et_al:LIPIcs.ICDT.2015.265,
  author =	{Hu, Xiaocheng and Tao, Yufei and Yang, Yi and Zhang, Shengyu and Zhou, Shuigeng},
  title =	{{On The I/O Complexity of Dynamic Distinct Counting}},
  booktitle =	{18th International Conference on Database Theory (ICDT 2015)},
  pages =	{265--276},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-79-8},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{31},
  editor =	{Arenas, Marcelo and Ugarte, Mart{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2015.265},
  URN =		{urn:nbn:de:0030-drops-49895},
  doi =		{10.4230/LIPIcs.ICDT.2015.265},
  annote =	{Keywords: distinct counting, lower bound, external memory}
}
Document
Invited Talk
Large-Scale Similarity Joins With Guarantees (Invited Talk)

Authors: Rasmus Pagh

Published in: LIPIcs, Volume 31, 18th International Conference on Database Theory (ICDT 2015)


Abstract
The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they are based on randomized techniques that have improved guarantees on computational cost but come with a probability of not returning the correct result. The aim of this paper is to give an overview of randomized techniques for high-dimensional similarity search, and discuss recent advances towards making these techniques more widely applicable by eliminating probability of error and improving the locality of data access.

Cite as

Rasmus Pagh. Large-Scale Similarity Joins With Guarantees (Invited Talk). In 18th International Conference on Database Theory (ICDT 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 31, pp. 15-24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)


Copy BibTex To Clipboard

@InProceedings{pagh:LIPIcs.ICDT.2015.15,
  author =	{Pagh, Rasmus},
  title =	{{Large-Scale Similarity Joins With Guarantees}},
  booktitle =	{18th International Conference on Database Theory (ICDT 2015)},
  pages =	{15--24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-79-8},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{31},
  editor =	{Arenas, Marcelo and Ugarte, Mart{\'\i}n},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2015.15},
  URN =		{urn:nbn:de:0030-drops-49995},
  doi =		{10.4230/LIPIcs.ICDT.2015.15},
  annote =	{Keywords: Similarity join, filtering, locality-sensitive hashing, recall}
}
  • Refine by Author
  • 2 Afshani, Peyman
  • 1 Ai, Yuqing
  • 1 Bouhadiba, Tayeb
  • 1 Carbin, Michael
  • 1 Deep, Shaleen
  • Show More...

  • Refine by Classification
  • 1 Computing methodologies → Machine learning
  • 1 Hardware → Chip-level power issues
  • 1 Hardware → Hardware description languages and compilation
  • 1 Software and its engineering → General programming languages
  • 1 Theory of computation → Communication complexity
  • Show More...

  • Refine by Keyword
  • 2 communication complexity
  • 1 Approximate Computing
  • 1 Data Structures
  • 1 Hardware Description Languages
  • 1 Machine Learning
  • Show More...

  • Refine by Type
  • 10 document

  • Refine by Publication Year
  • 4 2019
  • 2 2015
  • 2 2016
  • 1 2017
  • 1 2021

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail