Search Results

Documents authored by Abrar, Md. Hasin


Artifact
Software
pla-index

Authors: Md. Hasin Abrar and Paul Medvedev


Abstract

Cite as

Md. Hasin Abrar, Paul Medvedev. pla-index (Software). Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@misc{dagstuhl-artifact-22506,
   title = {{pla-index}}, 
   author = {Abrar, Md. Hasin and Medvedev, Paul},
   note = {Software, swhId: \href{https://archive.softwareheritage.org/swh:1:dir:a5ea07d009da014aff392e5896ba14b7376eba13;origin=https://github.com/medvedevgroup/pla-index;visit=swh:1:snp:7244e0b6165e37aa5e4a617ddeff2bfac291bdd6;anchor=swh:1:rev:3702e31ecccb31ef2081066a59541b4cc33b9f74}{\texttt{swh:1:dir:a5ea07d009da014aff392e5896ba14b7376eba13}} (visited on 2024-11-28)},
   url = {https://github.com/medvedevgroup/pla-index},
   doi = {10.4230/artifacts.22506},
}
Document
PLA-index: A k-mer Index Exploiting Rank Curve Linearity

Authors: Md. Hasin Abrar and Paul Medvedev

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)


Abstract
Given a sorted list of k-mers S, the rank curve of S is the function mapping a k-mer from the k-mer universe to the location in S where it either first appears or would be inserted. An exciting recent development is the observation that, for certain datasets, the rank curve is predictable and can be exploited to create small search indices. In this paper, we develop a novel search index that first estimates a k-mer’s rank using a piece-wise linear approximation of the rank curve and then does a local search to determine the precise location of the k-mer in the list. We combine ideas from previous approaches and supplement them with an innovative data representation strategy that substantially reduces space usage. Our PLA-index uses an order of magnitude less space than Sapling and uses less than half the space of the PGM-index, for roughly the same query time. For example, using only 9 MiB of memory, it can narrow down the position of k-mer in the suffix array of the human genome to within 255 positions. Furthermore, we demonstrate the potential of our approach to impact a variety of downstream applications. First, the PLA-index halves the time of binary search on the suffix array of the human genome. Second, the PLA-index reduces the space of a direct-access lookup table by 76 percent, without increasing the run time. Third, we plug the PLA-index into a state-of-the-art read aligner Strobealign and replace a 2 GiB component with a PLA-index of size 1.5 MiB, without significantly effecting runtime. The software and reproducibility information is freely available at https://github.com/medvedevgroup/pla-index.

Cite as

Md. Hasin Abrar and Paul Medvedev. PLA-index: A k-mer Index Exploiting Rank Curve Linearity. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 13:1-13:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{abrar_et_al:LIPIcs.WABI.2024.13,
  author =	{Abrar, Md. Hasin and Medvedev, Paul},
  title =	{{PLA-index: A k-mer Index Exploiting Rank Curve Linearity}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{13:1--13:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.13},
  URN =		{urn:nbn:de:0030-drops-206578},
  doi =		{10.4230/LIPIcs.WABI.2024.13},
  annote =	{Keywords: K-mer index, Piece-wise linear approximation, Learned index}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail