MONI Can Find k-MEMs

Authors Igor Tatarnikov , Ardavan Shahrabi Farahani, Sana Kashgouli, Travis Gagie



PDF
Thumbnail PDF

File

LIPIcs.CPM.2023.26.pdf
  • Filesize: 0.68 MB
  • 14 pages

Document Identifiers

Author Details

Igor Tatarnikov
  • Dalhousie University, Halifax, Canada
Ardavan Shahrabi Farahani
  • Dalhousie University, Halifax, Canada
Sana Kashgouli
  • Dalhousie University, Halifax, Canada
Travis Gagie
  • Dalhousie University, Halifax, Canada

Acknowledgements

The authors thank Christina Boucher, Ben Langmead, Manuel Mattheisen and Massimiliano Rossi for helpful discussions.

Cite AsGet BibTex

Igor Tatarnikov, Ardavan Shahrabi Farahani, Sana Kashgouli, and Travis Gagie. MONI Can Find k-MEMs. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.CPM.2023.26

Abstract

Suppose we are asked to index a text T [0..n - 1] such that, given a pattern P [0..m - 1], we can quickly report the maximal substrings of P that each occur in T at least k times. We first show how we can add O (r log n) bits to Rossi et al.’s recent MONI index, where r is the number of runs in the Burrows-Wheeler Transform of T, such that it supports such queries in O (k m log n) time. We then show how, if we are given k at construction time, we can reduce the query time to O (m log n).

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • Compact data structures
  • Burrows-Wheeler Transform
  • run-length compression
  • maximal exact matches

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Hideo Bannai, Travis Gagie, and Tomohiro I. Refining the r-index. Theoretical Computer Science, 812:96-108, 2020. Google Scholar
  2. Christina Boucher, Travis Gagie, Tomohiro I, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Alejandro Pacheco, and Massimiliano Rossi. PHONI: Streamed matching statistics with multi-genome references. In 2021 Data Compression Conference (DCC), pages 193-202. IEEE, 2021. Google Scholar
  3. Christina Boucher, Travis Gagie, Alan Kuhnle, Ben Langmead, Giovanni Manzini, and Taher Mun. Prefix-free parsing for building big BWTs. Algorithms for Molecular Biology, 14(1):1-15, 2019. Google Scholar
  4. Nathaniel K. Brown, Travis Gagie, and Massimiliano Rossi. RLBWT tricks. In 20th Symposium on Experimental Algorithms (SEA 2022), pages 16:1-16:16, 2022. Google Scholar
  5. Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Fully functional suffix trees and optimal text searching in BWT-runs bounded space. Journal of the ACM (JACM), 67(1):1-54, 2020. Google Scholar
  6. Juha Kärkkäinen, Dominik Kempa, and Marcin Piątkowski. Tighter bounds for the sum of irreducible LCP values. Theoretical Computer Science, 656:265-278, 2016. Google Scholar
  7. Juha Kärkkäinen, Giovanni Manzini, and Simon J Puglisi. Permuted longest-common-prefix array. In 20th Symposium on Combinatorial Pattern Matching (CPM), pages 181-192. Springer, 2009. Google Scholar
  8. Heng Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint, 2013. URL: https://arxiv.org/abs/1303.3997.
  9. Gonzalo Navarro. Compact Data Structures: A Practical Approach. Cambridge University Press, 2016. Google Scholar
  10. Gonzalo Navarro. Computing MEMs on repetitive text collections. arXiv preprint v3, 2022. Accepted to this conference. URL: https://arxiv.org/abs/2210.09914.
  11. Takaaki Nishimoto, Shunsuke Kanda, and Yasuo Tabei. An optimal-time RLBWT construction in BWT-runs bounded space. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), pages 99:1-99:20, 2022. Google Scholar
  12. Takaaki Nishimoto and Yasuo Tabei. Optimal-time queries on BWT-runs compressed indexes. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), 2021. Google Scholar
  13. Massimiliano Rossi, Marco Oliva, Ben Langmead, Travis Gagie, and Christina Boucher. MONI: A pangenomic index for finding maximal exact matches. Journal of Computational Biology, 29(2):169-187, 2022. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail