Search Results

Documents authored by Karayel, Emin


Document
Verification of the CVM Algorithm with a Functional Probabilistic Invariant

Authors: Emin Karayel, Seng Joe Watt, Derek Khu, Kuldeep S. Meel, and Yong Kiam Tan

Published in: LIPIcs, Volume 352, 16th International Conference on Interactive Theorem Proving (ITP 2025)


Abstract
Estimating the number of distinct elements in a data stream is a classic problem with numerous applications in computer science. We formalize a recent, remarkably simple, randomized algorithm for this problem due to Chakraborty, Vinodchandran, and Meel (called the CVM algorithm). Their algorithm deviated considerably from the state of the art, due to its avoidance of intricate derandomization techniques, while still maintaining a close-to-optimal logarithmic space complexity. Central to our formalization is a new proof technique based on functional probabilistic invariants, which allows us to derive concentration bounds using the Cramér-Chernoff method without relying on independence. This simplifies the formal analysis considerably compared to the original proof by Chakraborty et al. Moreover, our technique opens up the possible algorithm design space; we demonstrate this by introducing and verifying a new variant of the CVM algorithm that is both total and unbiased - neither of which is a property of the original algorithm. In this paper, we introduce the proof technique, describe its use in mechanizing both versions of the CVM algorithm in Isabelle/HOL, and present a supporting formalized library on negatively associated random variables used to verify the latter variant.

Cite as

Emin Karayel, Seng Joe Watt, Derek Khu, Kuldeep S. Meel, and Yong Kiam Tan. Verification of the CVM Algorithm with a Functional Probabilistic Invariant. In 16th International Conference on Interactive Theorem Proving (ITP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 352, pp. 34:1-34:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{karayel_et_al:LIPIcs.ITP.2025.34,
  author =	{Karayel, Emin and Watt, Seng Joe and Khu, Derek and Meel, Kuldeep S. and Tan, Yong Kiam},
  title =	{{Verification of the CVM Algorithm with a Functional Probabilistic Invariant}},
  booktitle =	{16th International Conference on Interactive Theorem Proving (ITP 2025)},
  pages =	{34:1--34:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-396-6},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{352},
  editor =	{Forster, Yannick and Keller, Chantal},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITP.2025.34},
  URN =		{urn:nbn:de:0030-drops-246327},
  doi =		{10.4230/LIPIcs.ITP.2025.34},
  annote =	{Keywords: Verification, Isabelle/HOL, Randomized Algorithms, Distinct Elements}
}
Document
RANDOM
An Embarrassingly Parallel Optimal-Space Cardinality Estimation Algorithm

Authors: Emin Karayel

Published in: LIPIcs, Volume 275, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)


Abstract
In 2020 Błasiok (ACM Trans. Algorithms 16(2) 3:1-3:28) constructed an optimal space streaming algorithm for the cardinality estimation problem with the space complexity of O(ε^{-2} ln(δ^{-1}) + ln n) where ε, δ and n denote the relative accuracy, failure probability and universe size, respectively. However, his solution requires the stream to be processed sequentially. On the other hand, there are algorithms that admit a merge operation; they can be used in a distributed setting, allowing parallel processing of sections of the stream, and are highly relevant for large-scale distributed applications. The best-known such algorithm, unfortunately, has a space complexity exceeding Ω(ln(δ^{-1}) (ε^{-2} ln ln n + ln n)). This work presents a new algorithm that improves on the solution by Błasiok, preserving its space complexity, but with the benefit that it admits such a merge operation, thus providing an optimal solution for the problem for both sequential and parallel applications. Orthogonally, the new algorithm also improves algorithmically on Błasiok’s solution (even in the sequential setting) by reducing its implementation complexity and requiring fewer distinct pseudo-random objects.

Cite as

Emin Karayel. An Embarrassingly Parallel Optimal-Space Cardinality Estimation Algorithm. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 275, pp. 35:1-35:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{karayel:LIPIcs.APPROX/RANDOM.2023.35,
  author =	{Karayel, Emin},
  title =	{{An Embarrassingly Parallel Optimal-Space Cardinality Estimation Algorithm}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023)},
  pages =	{35:1--35:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-296-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{275},
  editor =	{Megow, Nicole and Smith, Adam},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2023.35},
  URN =		{urn:nbn:de:0030-drops-188607},
  doi =		{10.4230/LIPIcs.APPROX/RANDOM.2023.35},
  annote =	{Keywords: Distinct Elements, Distributed Algorithms, Randomized Algorithms, Expander Graphs, Derandomization, Sketching}
}
Document
Formalization of Randomized Approximation Algorithms for Frequency Moments

Authors: Emin Karayel

Published in: LIPIcs, Volume 237, 13th International Conference on Interactive Theorem Proving (ITP 2022)


Abstract
In 1999 Alon et al. introduced the still active research topic of approximating the frequency moments of a data stream using randomized algorithms with minimal space usage. This includes the problem of estimating the cardinality of the stream elements - the zeroth frequency moment. Higher-order frequency moments provide information about the skew of the data stream which is, for example, critical information for parallel processing. (The k-th frequency moment of a data stream is the sum of the k-th powers of the occurrence counts of each element in the stream.) They introduce both lower bounds and upper bounds on the space complexity of the problems, which were later improved by newer publications. The algorithms have guaranteed success probabilities and accuracies without making any assumptions on the input distribution. They are an interesting use case for formal verification because their correctness proofs require a large body of deep results from algebra, analysis and probability theory. This work reports on the formal verification of three algorithms for the approximation of F₀, F₂ and F_k for k ≥ 3. The results include the identification of significantly simpler algorithms with the same runtime and space complexities as the previously known ones as well as the development of several reusable components, such as a formalization of universal hash families, amplification methods for randomized algorithms, a model for one-pass data stream algorithms or a generic flexible encoding library for the verification of space complexities.

Cite as

Emin Karayel. Formalization of Randomized Approximation Algorithms for Frequency Moments. In 13th International Conference on Interactive Theorem Proving (ITP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 237, pp. 21:1-21:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{karayel:LIPIcs.ITP.2022.21,
  author =	{Karayel, Emin},
  title =	{{Formalization of Randomized Approximation Algorithms for Frequency Moments}},
  booktitle =	{13th International Conference on Interactive Theorem Proving (ITP 2022)},
  pages =	{21:1--21:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-252-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{237},
  editor =	{Andronick, June and de Moura, Leonardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITP.2022.21},
  URN =		{urn:nbn:de:0030-drops-167308},
  doi =		{10.4230/LIPIcs.ITP.2022.21},
  annote =	{Keywords: Formal Verification, Isabelle/HOL, Randomized Algorithms, Frequency Moments}
}
Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail