Search Results

Documents authored by Lumbroso, Jérémie


Document
Bit-Array-Based Alternatives to HyperLogLog

Authors: Svante Janson, Jérémie Lumbroso, and Robert Sedgewick

Published in: LIPIcs, Volume 302, 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024)


Abstract
We present a family of algorithms for the problem of estimating the number of distinct items in an input stream that are simple to implement and are appropriate for practical applications. Our algorithms are a logical extension of the series of algorithms developed by Flajolet and his coauthors starting in 1983 that culminated in the widely used HyperLogLog algorithm. These algorithms divide the input stream into M substreams and lead to a time-accuracy tradeoff where a constant number of bits per substream are saved to achieve a relative accuracy proportional to 1/√M. Our algorithms use just one or two bits per substream. Their effectiveness is demonstrated by a proof of approximate normality, with explicit expressions for standard errors that inform parameter settings and allow proper quantitative comparisons with other methods. Hypotheses about performance are validated through experiments using a realistic input stream, with the conclusion that our algorithms are more accurate than HyperLogLog when using the same amount of memory, and they use two-thirds as much memory as HyperLogLog to achieve a given accuracy.

Cite as

Svante Janson, Jérémie Lumbroso, and Robert Sedgewick. Bit-Array-Based Alternatives to HyperLogLog. In 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 302, pp. 5:1-5:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{janson_et_al:LIPIcs.AofA.2024.5,
  author =	{Janson, Svante and Lumbroso, J\'{e}r\'{e}mie and Sedgewick, Robert},
  title =	{{Bit-Array-Based Alternatives to HyperLogLog}},
  booktitle =	{35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024)},
  pages =	{5:1--5:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-329-4},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{302},
  editor =	{Mailler, C\'{e}cile and Wild, Sebastian},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AofA.2024.5},
  URN =		{urn:nbn:de:0030-drops-204402},
  doi =		{10.4230/LIPIcs.AofA.2024.5},
  annote =	{Keywords: Cardinality estimation, sketching, Hyperloglog}
}
Document
Affirmative Sampling: Theory and Applications

Authors: Jérémie Lumbroso and Conrado Martínez

Published in: LIPIcs, Volume 225, 33rd International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2022)


Abstract
Affirmative Sampling is a practical and efficient novel algorithm to obtain random samples of distinct elements from a data stream. Its most salient feature is that the size S of the sample will, on expectation, grow with the (unknown) number n of distinct elements in the data stream. As any distinct element has the same probability to be sampled, and the sample size is greater when the "diversity" (the number of distinct elements) is greater, the samples that Affirmative Sampling delivers are more representative than those produced by any scheme where the sample size is fixed a priori - hence its name. Our algorithm is straightforward to implement, and several implementations already exist.

Cite as

Jérémie Lumbroso and Conrado Martínez. Affirmative Sampling: Theory and Applications. In 33rd International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 225, pp. 12:1-12:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{lumbroso_et_al:LIPIcs.AofA.2022.12,
  author =	{Lumbroso, J\'{e}r\'{e}mie and Mart{\'\i}nez, Conrado},
  title =	{{Affirmative Sampling: Theory and Applications}},
  booktitle =	{33rd International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2022)},
  pages =	{12:1--12:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-230-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{225},
  editor =	{Ward, Mark Daniel},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.AofA.2022.12},
  URN =		{urn:nbn:de:0030-drops-160987},
  doi =		{10.4230/LIPIcs.AofA.2022.12},
  annote =	{Keywords: Data streams, Distinct sampling, Random sampling, Cardinality estimation, Analysis of algorithms}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail