3 Search Results for "Katzman, Jonathan"


Document
DiVerG: Scalable Distance Index for Validation of Paired-End Alignments in Sequence Graphs

Authors: Ali Ghaffaari, Alexander Schönhuth, and Tobias Marschall

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
Determining the distance between two loci within a genomic region is a recurrent operation in various tasks in computational genomics. A notable example of this task arises in paired-end read mapping as a form of validation of distances between multiple alignments. While straightforward for a single genome, graph-based reference structures render the operation considerably more involved. Given the sheer number of such queries in a typical read mapping experiment, an efficient algorithm for answering distance queries is crucial. In this paper, we introduce DiVerG, a compact data structure as well as a fast and scalable algorithm, for constructing distance indexes for general sequence graphs on multi-core CPU and many-core GPU architectures. DiVerG is based on PairG [Jain et al., 2019], but overcomes the limitations of PairG by exploiting the extensive potential for improvements in terms of scalability and space efficiency. As a consequence, DiVerG can process substantially larger datasets, such as whole human genomes, which are unmanageable by PairG. DiVerG offers faster index construction time and consistently faster query time with gains proportional to the size of the underlying compact data structure. We demonstrate that our method performs favorably on multiple real datasets at various scales. DiVerG achieves superior performance over PairG; e.g. resulting to 2.5-4x speed-up in query time, 44-340x smaller index size, and 3-50x faster construction time for the genome graph of the MHC region, as a particularly variable region of the human genome. The implementation is available at: https://github.com/cartoonist/diverg

Cite as

Ali Ghaffaari, Alexander Schönhuth, and Tobias Marschall. DiVerG: Scalable Distance Index for Validation of Paired-End Alignments in Sequence Graphs. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 10:1-10:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{ghaffaari_et_al:LIPIcs.WABI.2025.10,
  author =	{Ghaffaari, Ali and Sch\"{o}nhuth, Alexander and Marschall, Tobias},
  title =	{{DiVerG: Scalable Distance Index for Validation of Paired-End Alignments in Sequence Graphs}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{10:1--10:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.10},
  URN =		{urn:nbn:de:0030-drops-239369},
  doi =		{10.4230/LIPIcs.WABI.2025.10},
  annote =	{Keywords: Sequence graph, distance index, read mapping, sparse matrix}
}
Document
Sketching, Moment Estimation, and the Lévy-Khintchine Representation Theorem

Authors: Seth Pettie and Dingyu Wang

Published in: LIPIcs, Volume 325, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025)


Abstract
In the d-dimensional turnstile streaming model, a frequency vector 𝐱 = (𝐱(1),…,𝐱(n)) ∈ (ℝ^d)ⁿ is updated entry-wisely over a stream. We consider the problem of f-moment estimation for which one wants to estimate f(𝐱)=∑_{v ∈ [n]}f(𝐱(v)) with a small-space sketch. A function f is tractable if the f-moment can be estimated to within a constant factor using polylog(n) space. The f-moment estimation problem has been intensively studied in the d = 1 case. Flajolet and Martin estimate the F₀-moment (f(x) = 1 (x > 0), incremental stream); Alon, Matias, and Szegedy estimate the L₂-moment (f(x) = x²); Indyk estimates the L_α-moment (f(x) = |x|^α), α ∈ (0,2]. For d ≥ 2, Ganguly, Bansal, and Dube estimate the L_{p,q} hybrid moment (f:ℝ^d → ℝ,f(x) = (∑_{j = 1}^d |x_j|^p)^q), p ∈ (0,2],q ∈ (0,1). For tractability, Bar-Yossef, Jayram, Kumar, and Sivakumar show that f(x) = |x|^α is not tractable for α > 2. Braverman, Chestnut, Woodruff, and Yang characterize the class of tractable one-variable functions except for a class of nearly periodic functions. In this work we present a simple and generic scheme to construct sketches with the novel idea of hashing indices to Lévy processes, from which one can estimate the f-moment f(𝐱) where f is the characteristic exponent of the Lévy process. The fundamental Lévy-Khintchine representation theorem completely characterizes the space of all possible characteristic exponents, which in turn characterizes the set of f-moments that can be estimated by this generic scheme. The new scheme has strong explanatory power. It unifies the construction of many existing sketches (F₀, L₀, L₂, L_α, L_{p,q}, etc.) and it implies the tractability of many nearly periodic functions that were previously unclassified. Furthermore, the scheme can be conveniently generalized to multidimensional cases (d ≥ 2) by considering multidimensional Lévy processes and can be further generalized to estimate heterogeneous moments by projecting different indices with different Lévy processes. We conjecture that the set of tractable functions can be characterized using the Lévy-Khintchine representation theorem via what we called the Fourier-Hahn-Lévy method.

Cite as

Seth Pettie and Dingyu Wang. Sketching, Moment Estimation, and the Lévy-Khintchine Representation Theorem. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 77:1-77:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{pettie_et_al:LIPIcs.ITCS.2025.77,
  author =	{Pettie, Seth and Wang, Dingyu},
  title =	{{Sketching, Moment Estimation, and the L\'{e}vy-Khintchine Representation Theorem}},
  booktitle =	{16th Innovations in Theoretical Computer Science Conference (ITCS 2025)},
  pages =	{77:1--77:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-361-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{325},
  editor =	{Meka, Raghu},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.77},
  URN =		{urn:nbn:de:0030-drops-227057},
  doi =		{10.4230/LIPIcs.ITCS.2025.77},
  annote =	{Keywords: Streaming Sketches, L\'{e}vy Processes}
}
Document
An Optimal Algorithm for Large Frequency Moments Using O(n^(1-2/k)) Bits

Authors: Vladimir Braverman, Jonathan Katzman, Charles Seidell, and Gregory Vorsanger

Published in: LIPIcs, Volume 28, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014)


Abstract
In this paper, we provide the first optimal algorithm for the remaining open question from the seminal paper of Alon, Matias, and Szegedy: approximating large frequency moments. We give an upper bound on the space required to find a k-th frequency moment of O(n^(1-2/k)) bits that matches, up to a constant factor, the lower bound of Woodruff et. al for constant epsilon and constant k. Our algorithm makes a single pass over the stream and works for any constant k > 3. It is based upon two major technical accomplishments: first, we provide an optimal algorithm for finding the heavy elements in a stream; and second, we provide a technique using Martingale Sketches which gives an optimal reduction of the large frequency moment problem to the all heavy elements problem. We also provide a polylogarithmic improvement for frequency moments, frequency based functions, spatial data streams, and measuring independence of data sets.

Cite as

Vladimir Braverman, Jonathan Katzman, Charles Seidell, and Gregory Vorsanger. An Optimal Algorithm for Large Frequency Moments Using O(n^(1-2/k)) Bits. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014). Leibniz International Proceedings in Informatics (LIPIcs), Volume 28, pp. 531-544, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)


Copy BibTex To Clipboard

@InProceedings{braverman_et_al:LIPIcs.APPROX-RANDOM.2014.531,
  author =	{Braverman, Vladimir and Katzman, Jonathan and Seidell, Charles and Vorsanger, Gregory},
  title =	{{An Optimal Algorithm for Large Frequency Moments Using O(n^(1-2/k)) Bits}},
  booktitle =	{Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2014)},
  pages =	{531--544},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-74-3},
  ISSN =	{1868-8969},
  year =	{2014},
  volume =	{28},
  editor =	{Jansen, Klaus and Rolim, Jos\'{e} and Devanur, Nikhil R. and Moore, Cristopher},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2014.531},
  URN =		{urn:nbn:de:0030-drops-47217},
  doi =		{10.4230/LIPIcs.APPROX-RANDOM.2014.531},
  annote =	{Keywords: Streaming Algorithms, Randomized Algorithms, Frequency Moments, Heavy Hitters}
}
  • Refine by Type
  • 3 Document/PDF
  • 2 Document/HTML

  • Refine by Publication Year
  • 2 2025
  • 1 2014

  • Refine by Author
  • 1 Braverman, Vladimir
  • 1 Ghaffaari, Ali
  • 1 Katzman, Jonathan
  • 1 Marschall, Tobias
  • 1 Pettie, Seth
  • Show More...

  • Refine by Series/Journal
  • 3 LIPIcs

  • Refine by Classification
  • 1 Applied computing → Computational genomics
  • 1 Mathematics of computing → Paths and connectivity problems
  • 1 Theory of computation → Sketching and sampling

  • Refine by Keyword
  • 1 Frequency Moments
  • 1 Heavy Hitters
  • 1 Lévy Processes
  • 1 Randomized Algorithms
  • 1 Sequence graph
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail