Search Results

Documents authored by McCauley, Samuel


Document
SPIDER: Improved Succinct Rank and Select Performance

Authors: Matthew D. Laws, Jocelyn Bliven, Kit Conklin, Elyes Laalai, Samuel McCauley, and Zach S. Sturdevant

Published in: LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)


Abstract
Rank and select data structures seek to preprocess a bit vector to quickly answer two kinds of queries: Rank(i) gives the number of 1 bits in slots 0 through i, and Select(j) gives the first slot s with Rank(s) = j. A succinct data structure can answer these queries while using space much smaller than the size of the original bit vector. State of the art succinct rank and select data structures use as little as 4% extra space (over the underlying bit vector) while answering rank and select queries very quickly. Rank queries can be answered using only a handful of array accesses. Select queries can be answered by starting with similar array accesses, followed by a linear scan through the bit vector. Nonetheless, a tradeoff remains: data structures that use under 4% space are significantly slower at answering rank and select queries than less-space-efficient data structures (using, say, over 20% extra space). In this paper we make significantly progress towards closing this gap. We give a new data structure, SPIDER, which uses 3.82% extra space. SPIDER gives the best known rank query time for data sets of 8 billion or more bits, even compared to much less space-efficient data structures. For select queries, SPIDER outperforms all data structures that use less than 4% space, and significantly closes the gap in select performance between data structures with less than 4% space, and those that use more (over 20% for both rank and select) space. SPIDER makes two main technical contributions. For rank queries, it improves performance by interleaving the metadata with the bit vector to improve cache efficiency. For select queries, it uses predictions to almost eliminate the cost of the linear scan. These predictions are inspired by recent results on data structures with machine-learned predictions, adapted to the succinct data structure setting. Our results hold on both real and synthetic data, showing that these predictions are effective in practice.

Cite as

Matthew D. Laws, Jocelyn Bliven, Kit Conklin, Elyes Laalai, Samuel McCauley, and Zach S. Sturdevant. SPIDER: Improved Succinct Rank and Select Performance. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{laws_et_al:LIPIcs.SEA.2024.21,
  author =	{Laws, Matthew D. and Bliven, Jocelyn and Conklin, Kit and Laalai, Elyes and McCauley, Samuel and Sturdevant, Zach S.},
  title =	{{SPIDER: Improved Succinct Rank and Select Performance}},
  booktitle =	{22nd International Symposium on Experimental Algorithms (SEA 2024)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-325-6},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{301},
  editor =	{Liberti, Leo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.21},
  URN =		{urn:nbn:de:0030-drops-203865},
  doi =		{10.4230/LIPIcs.SEA.2024.21},
  annote =	{Keywords: Rank and Select, Succinct Data Structures, Data Structres, Cache Performance, Predictions}
}
Document
Telescoping Filter: A Practical Adaptive Filter

Authors: David J. Lee, Samuel McCauley, Shikha Singh, and Max Stein

Published in: LIPIcs, Volume 204, 29th Annual European Symposium on Algorithms (ESA 2021)


Abstract
Filters are small, fast, and approximate set membership data structures. They are often used to filter out expensive accesses to a remote set S for negative queries (that is, filtering out queries x ∉ S). Filters have one-sided errors: on a negative query, a filter may say "present" with a tunable false-positive probability of ε. Correctness is traded for space: filters only use log (1/ε) + O(1) bits per element. The false-positive guarantees of most filters, however, hold only for a single query. In particular, if x is a false positive, a subsequent query to x is a false positive with probability 1, not ε. With this in mind, recent work has introduced the notion of an adaptive filter. A filter is adaptive if each query is a false positive with probability ε, regardless of answers to previous queries. This requires "fixing" false positives as they occur. Adaptive filters not only provide strong false positive guarantees in adversarial environments but also improve query performance on practical workloads by eliminating repeated false positives. Existing work on adaptive filters falls into two categories. On the one hand, there are practical filters, based on the cuckoo filter, that attempt to fix false positives heuristically without meeting the adaptivity guarantee. On the other hand, the broom filter is a very complex adaptive filter that meets the optimal theoretical bounds. In this paper, we bridge this gap by designing the telescoping adaptive filter (TAF), a practical, provably adaptive filter. We provide theoretical false-positive and space guarantees for our filter, along with empirical results where we compare its performance against state-of-the-art filters. We also implement the broom filter and compare it to the TAF. Our experiments show that theoretical adaptivity can lead to improved false-positive performance on practical inputs, and can be achieved while maintaining throughput that is similar to non-adaptive filters.

Cite as

David J. Lee, Samuel McCauley, Shikha Singh, and Max Stein. Telescoping Filter: A Practical Adaptive Filter. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 60:1-60:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{lee_et_al:LIPIcs.ESA.2021.60,
  author =	{Lee, David J. and McCauley, Samuel and Singh, Shikha and Stein, Max},
  title =	{{Telescoping Filter: A Practical Adaptive Filter}},
  booktitle =	{29th Annual European Symposium on Algorithms (ESA 2021)},
  pages =	{60:1--60:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-204-4},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{204},
  editor =	{Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2021.60},
  URN =		{urn:nbn:de:0030-drops-146410},
  doi =		{10.4230/LIPIcs.ESA.2021.60},
  annote =	{Keywords: Filters, approximate-membership query data structures (AMQs), Bloom filters, quotient filters, cuckoo filters, adaptivity, succinct data structures}
}
Document
Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing

Authors: Samuel McCauley

Published in: LIPIcs, Volume 186, 24th International Conference on Database Theory (ICDT 2021)


Abstract
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess n strings of length d, to quickly answer queries q of the form: if there is a database string within edit distance r of q, return a database string within edit distance cr of q. Previous approaches to this problem either rely on very large (superconstant) approximation ratios c, or very small search radii r. Outside of a narrow parameter range, these solutions are not competitive with trivially searching through all n strings. In this work we give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically, our strategy can answer queries in time Õ(d3^rn^{1/c}). The best known practical results require c ≫ r to achieve any correctness guarantee; meanwhile, the best known theoretical results are very involved and difficult to implement, and require query time that can be loosely bounded below by 24^r. Our results significantly broaden the range of parameters for which there exist nontrivial theoretical bounds, while retaining the practicality of a locality-sensitive hash function.

Cite as

Samuel McCauley. Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 21:1-21:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


Copy BibTex To Clipboard

@InProceedings{mccauley:LIPIcs.ICDT.2021.21,
  author =	{McCauley, Samuel},
  title =	{{Approximate Similarity Search Under Edit Distance Using Locality-Sensitive Hashing}},
  booktitle =	{24th International Conference on Database Theory (ICDT 2021)},
  pages =	{21:1--21:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-179-5},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{186},
  editor =	{Yi, Ke and Wei, Zhewei},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2021.21},
  URN =		{urn:nbn:de:0030-drops-137299},
  doi =		{10.4230/LIPIcs.ICDT.2021.21},
  annote =	{Keywords: edit distance, approximate pattern matching, approximate nearest neighbor, similarity search, locality-sensitive hashing}
}
Document
Non-Cooperative Rational Interactive Proofs

Authors: Jing Chen, Samuel McCauley, and Shikha Singh

Published in: LIPIcs, Volume 144, 27th Annual European Symposium on Algorithms (ESA 2019)


Abstract
Interactive-proof games model the scenario where an honest party interacts with powerful but strategic provers, to elicit from them the correct answer to a computational question. Interactive proofs are increasingly used as a framework to design protocols for computation outsourcing. Existing interactive-proof games largely fall into two categories: either as games of cooperation such as multi-prover interactive proofs and cooperative rational proofs, where the provers work together as a team; or as games of conflict such as refereed games, where the provers directly compete with each other in a zero-sum game. Neither of these extremes truly capture the strategic nature of service providers in outsourcing applications. How to design and analyze non-cooperative interactive proofs is an important open problem. In this paper, we introduce a mechanism-design approach to define a multi-prover interactive-proof model in which the provers are rational and non-cooperative - they act to maximize their expected utility given others' strategies. We define a strong notion of backwards induction as our solution concept to analyze the resulting extensive-form game with imperfect information. We fully characterize the complexity of our proof system under different utility gap guarantees. (At a high level, a utility gap of u means that the protocol is robust against provers that may not care about a utility loss of 1/u.) We show, for example, that the power of non-cooperative rational interactive proofs with a polynomial utility gap is exactly equal to the complexity class P^{NEXP}.

Cite as

Jing Chen, Samuel McCauley, and Shikha Singh. Non-Cooperative Rational Interactive Proofs. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 29:1-29:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.ESA.2019.29,
  author =	{Chen, Jing and McCauley, Samuel and Singh, Shikha},
  title =	{{Non-Cooperative Rational Interactive Proofs}},
  booktitle =	{27th Annual European Symposium on Algorithms (ESA 2019)},
  pages =	{29:1--29:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-124-5},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{144},
  editor =	{Bender, Michael A. and Svensson, Ola and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2019.29},
  URN =		{urn:nbn:de:0030-drops-111508},
  doi =		{10.4230/LIPIcs.ESA.2019.29},
  annote =	{Keywords: non-cooperative game theory, extensive-form games with imperfect information, refined sequential equilibrium, rational proofs, interactive proofs}
}
Document
Resource Optimization for Program Committee Members: A Subreview Article

Authors: Michael A. Bender, Samuel McCauley, Bertrand Simon, Shikha Singh, and Frédéric Vivien

Published in: LIPIcs, Volume 49, 8th International Conference on Fun with Algorithms (FUN 2016)


Abstract
This paper formalizes a resource-allocation problem that is all too familiar to the seasoned program-committee member. For each submission j that the PC member has the honor of reviewing, there is a choice. The PC member can spend the time to review submission j in detail on his/her own at a cost of C_i. Alternatively, the PC member can spend the time to identify and contact peers, hoping to recruit them as subreviewers, at a cost of 1 per subreviewer. These potential subreviewers have a certain probability of rejecting each review request, and this probability increases as time goes on. Once the PC member runs out of time or unasked experts, he/she is forced to review the paper without outside assistance. This paper gives optimal solutions to several variations of the scheduling-reviewers problem. Most of the solutions from this paper are based on an iterated log function of C_i. In particular, with k rounds, the optimal solution sends the k-iterated log of C_i requests in the first round, the (k-1)-iterated log in the second round, and so forth. One of the contributions of this paper is solving this problem exactly, even when rejection probabilities may increase. Naturally, PC members must make an integral number of subreview requests. This paper gives, as an intermediate result, a linear-time algorithm to transform the artificial problem in which one can send fractional requests into the less-artificial problem in which one sends an integral number of requests. Finally, this paper considers the case where the PC member knows nothing about the probability that a potential subreviewer agrees to review the paper. This paper gives an approximation algorithm for this case, whose bounds improve as the number of rounds increases.

Cite as

Michael A. Bender, Samuel McCauley, Bertrand Simon, Shikha Singh, and Frédéric Vivien. Resource Optimization for Program Committee Members: A Subreview Article. In 8th International Conference on Fun with Algorithms (FUN 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 49, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{bender_et_al:LIPIcs.FUN.2016.7,
  author =	{Bender, Michael A. and McCauley, Samuel and Simon, Bertrand and Singh, Shikha and Vivien, Fr\'{e}d\'{e}ric},
  title =	{{Resource Optimization for Program Committee Members: A Subreview Article}},
  booktitle =	{8th International Conference on Fun with Algorithms (FUN 2016)},
  pages =	{7:1--7:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-005-7},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{49},
  editor =	{Demaine, Erik D. and Grandoni, Fabrizio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.FUN.2016.7},
  URN =		{urn:nbn:de:0030-drops-58872},
  doi =		{10.4230/LIPIcs.FUN.2016.7},
  annote =	{Keywords: Scheduling, Delegation, Subreviews}
}