3 Search Results for "Wang, Huayi"


Document
Efficiency of Learned Indexes on Genome Spectra

Authors: Md. Hasin Abrar, Paul Medvedev, and Giorgio Vinciguerra

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)


Abstract
Data structures on a multiset of genomic k-mers are at the heart of many bioinformatic tools. As genomic datasets grow in scale, the efficiency of these data structures increasingly depends on how well they leverage the inherent patterns in the data. One recent and effective approach is the use of learned indexes that approximate the rank function of a multiset using a piecewise linear function with very few segments. However, theoretical worst-case analysis struggles to predict the practical performance of these indexes. We address this limitation by developing a novel measure of piecewise-linear approximability of the data, called CaPLa (Canonical Piecewise Linear approximability). CaPLa builds on the empirical observation that a power-law model often serves as a reasonable proxy for piecewise linear-approximability, while explicitly accounting for deviations from a true power-law fit. We prove basic properties of CaPLa and present an efficient algorithm to compute it. We then demonstrate that CaPLa can accurately predict space bounds for data structures on real data. Empirically, we analyze over 500 genomes through the lens of CaPLa, revealing that it varies widely across the tree of life and even within individual genomes. Finally, we study the robustness of CaPLa as a measure and the factors that make genomic k-mer multisets different from random ones.

Cite as

Md. Hasin Abrar, Paul Medvedev, and Giorgio Vinciguerra. Efficiency of Learned Indexes on Genome Spectra. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{abrar_et_al:LIPIcs.ESA.2025.18,
  author =	{Abrar, Md. Hasin and Medvedev, Paul and Vinciguerra, Giorgio},
  title =	{{Efficiency of Learned Indexes on Genome Spectra}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{18:1--18:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.18},
  URN =		{urn:nbn:de:0030-drops-244865},
  doi =		{10.4230/LIPIcs.ESA.2025.18},
  annote =	{Keywords: Genome spectra, piecewise linear approximation, learned index, k-mers}
}
Document
On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions

Authors: Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara

Published in: LIPIcs, Volume 255, 26th International Conference on Database Theory (ICDT 2023)


Abstract
d-dimensional (for d > 1) efficient range-summability (dD-ERS) of random variables (RVs) is a fundamental algorithmic problem that has applications to two important families of database problems, namely, fast approximate wavelet tracking (FAWT) on data streams and approximately answering range-sum queries over a data cube. Whether there are efficient solutions to the dD-ERS problem, or to the latter database problem, have been two long-standing open problems. Both are solved in this work. Specifically, we propose a novel solution framework to dD-ERS on RVs that have Gaussian or Poisson distribution. Our dD-ERS solutions are the first ones that have polylogarithmic time complexities. Furthermore, we develop a novel k-wise independence theory that allows our dD-ERS solutions to have both high computational efficiencies and strong provable independence guarantees. Finally, we show that under a sufficient and likely necessary condition, certain existing solutions for 1D-ERS can be generalized to higher dimensions.

Cite as

Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara. On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{meng_et_al:LIPIcs.ICDT.2023.21,
  author =	{Meng, Jingfan and Wang, Huayi and Xu, Jun and Ogihara, Mitsunori},
  title =	{{On Efficient Range-Summability of IID Random Variables in Two or Higher Dimensions}},
  booktitle =	{26th International Conference on Database Theory (ICDT 2023)},
  pages =	{21:1--21:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-270-9},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{255},
  editor =	{Geerts, Floris and Vandevoort, Brecht},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2023.21},
  URN =		{urn:nbn:de:0030-drops-177624},
  doi =		{10.4230/LIPIcs.ICDT.2023.21},
  annote =	{Keywords: fast range-summation, multidimensional data streams, Haar wavelet transform}
}
Document
A Dyadic Simulation Approach to Efficient Range-Summability

Authors: Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara

Published in: LIPIcs, Volume 220, 25th International Conference on Database Theory (ICDT 2022)


Abstract
Efficient range-summability (ERS) of a long list of random variables is a fundamental algorithmic problem that has applications to three important database applications, namely, data stream processing, space-efficient histogram maintenance (SEHM), and approximate nearest neighbor searches (ANNS). In this work, we propose a novel dyadic simulation framework and develop three novel ERS solutions, namely Gaussian-dyadic simulation tree (DST), Cauchy-DST and Random Walk-DST, using it. We also propose novel rejection sampling techniques to make these solutions computationally efficient. Furthermore, we develop a novel k-wise independence theory that allows our ERS solutions to have both high computational efficiencies and strong provable independence guarantees.

Cite as

Jingfan Meng, Huayi Wang, Jun Xu, and Mitsunori Ogihara. A Dyadic Simulation Approach to Efficient Range-Summability. In 25th International Conference on Database Theory (ICDT 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 220, pp. 17:1-17:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{meng_et_al:LIPIcs.ICDT.2022.17,
  author =	{Meng, Jingfan and Wang, Huayi and Xu, Jun and Ogihara, Mitsunori},
  title =	{{A Dyadic Simulation Approach to Efficient Range-Summability}},
  booktitle =	{25th International Conference on Database Theory (ICDT 2022)},
  pages =	{17:1--17:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-223-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{220},
  editor =	{Olteanu, Dan and Vortmeier, Nils},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICDT.2022.17},
  URN =		{urn:nbn:de:0030-drops-158915},
  doi =		{10.4230/LIPIcs.ICDT.2022.17},
  annote =	{Keywords: fast range-summation, locality-sensitive hashing, rejection sampling}
}
  • Refine by Type
  • 3 Document/PDF
  • 1 Document/HTML

  • Refine by Publication Year
  • 1 2025
  • 1 2023
  • 1 2022

  • Refine by Author
  • 2 Meng, Jingfan
  • 2 Ogihara, Mitsunori
  • 2 Wang, Huayi
  • 2 Xu, Jun
  • 1 Abrar, Md. Hasin
  • Show More...

  • Refine by Series/Journal
  • 3 LIPIcs

  • Refine by Classification
  • 2 Theory of computation → Streaming, sublinear and near linear time algorithms
  • 1 Applied computing → Bioinformatics
  • 1 Applied computing → Computational biology
  • 1 Mathematics of computing → Random number generation
  • 1 Theory of computation → Data structures design and analysis

  • Refine by Keyword
  • 2 fast range-summation
  • 1 Genome spectra
  • 1 Haar wavelet transform
  • 1 k-mers
  • 1 learned index
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail