2 Search Results for "Christiani, Tobias"


Document
Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing

Authors: Samuel McCauley

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess n strings of length d, to quickly answer queries q of the form: if there is a database string within edit distance r of q, return a database string within edit distance cr of q. Previous approaches to this problem either rely on very large (superconstant) approximation ratios c, or very small search radii r. Outside of a narrow parameter range, these solutions are not competitive with trivially searching through all n strings. In this work we give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically, our strategy can answer queries in time Õ(d3^rn^{1/c}). The best known practical results require c ≫ r to achieve any correctness guarantee; meanwhile, the best known theoretical results are very involved and difficult to implement, and require query time that can be loosely bounded below by 24^r. Our results significantly broaden the range of parameters for which there exist nontrivial theoretical bounds, while retaining the practicality of a locality-sensitive hash function.

Cite as

Samuel McCauley. Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 21:1-21:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{mccauley:LIPIcs.ICDT.2021.21,
  author =	{McCauley, Samuel},
  title =	{{Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{21:1--21:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.21},
  URN =		{urn:nbn:de:0030-drops-137299},
  doi =		{10.4230/LIPIcs.ICDT.2021.21},
  annote =	{Keywords: edit distance, approximate pattern matching, approximate nearest neighbor, similarity search, locality-sensitive hashing}
}
Document
PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors

Authors: Martin Aumüller, Tobias Christiani, Rasmus Pagh, and Michael Vesterli

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
We present PUFFINN, a parameterless LSH-based index for solving the k-nearest neighbor problem with probabilistic guarantees. By parameterless we mean that the user is only required to specify the amount of memory the index is supposed to use and the result quality that should be achieved. The index combines several heuristic ideas known in the literature. By small adaptions to the query algorithm, we make heuristics rigorous. We perform experiments on real-world and synthetic inputs to evaluate implementation choices and show that the implementation satisfies the quality guarantees while being competitive with other state-of-the-art approaches to nearest neighbor search. We describe a novel synthetic data set that is difficult to solve for almost all existing nearest neighbor search approaches, and for which PUFFINN significantly outperform previous methods.

Cite as

Martin Aumüller, Tobias Christiani, Rasmus Pagh, and Michael Vesterli. PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 10:1-10:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{aumuller_et_al:LIPIcs.ESA.2019.10,
  author =	{Aum\"{u}ller, Martin and Christiani, Tobias and Pagh, Rasmus and Vesterli, Michael},
  title =	{{PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{10:1--10:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.10},
  URN =		{urn:nbn:de:0030-drops-111317},
  doi =		{10.4230/LIPIcs.ESA.2019.10},
  annote =	{Keywords: Nearest Neighbor Search, Locality-Sensitive Hashing, Adaptive Similarity Search}
}
  • Refine by Author
  • 1 Aumüller, Martin
  • 1 Christiani, Tobias
  • 1 McCauley, Samuel
  • 1 Pagh, Rasmus
  • 1 Vesterli, Michael

  • Refine by Classification
  • 1 Information systems → Nearest-neighbor search
  • 1 Mathematics of computing → Probabilistic algorithms
  • 1 Theory of computation → Design and analysis of algorithms
  • 1 Theory of computation → Nearest neighbor algorithms
  • 1 Theory of computation → Pattern matching

  • Refine by Keyword
  • 1 Adaptive Similarity Search
  • 1 Locality-Sensitive Hashing
  • 1 Nearest Neighbor Search
  • 1 approximate nearest neighbor
  • 1 approximate pattern matching
  • Show More...

  • Refine by Type
  • 2 document

  • Refine by Publication Year
  • 1 2019
  • 1 2021

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail