MONI Can Find k-MEMs

Tatarnikov, Igor; Shahrabi Farahani, Ardavan; Kashgouli, Sana; Gagie, Travis

doi:10.4230/LIPIcs.CPM.2023.26

File

LIPIcs.CPM.2023.26.pdf

Filesize: 0.68 MB
14 pages

Document Identifiers

DOI: 10.4230/LIPIcs.CPM.2023.26
URN: urn:nbn:de:0030-drops-179802

Author Details

Igor Tatarnikov

Dalhousie University, Halifax, Canada

Ardavan Shahrabi Farahani

Dalhousie University, Halifax, Canada

Sana Kashgouli

Dalhousie University, Halifax, Canada

Travis Gagie

Dalhousie University, Halifax, Canada

Acknowledgements

The authors thank Christina Boucher, Ben Langmead, Manuel Mattheisen and Massimiliano Rossi for helpful discussions.

Cite AsGet BibTex

Igor Tatarnikov, Ardavan Shahrabi Farahani, Sana Kashgouli, and Travis Gagie. MONI Can Find k-MEMs. In 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 26:1-26:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.CPM.2023.26

Abstract

Suppose we are asked to index a text T [0..n - 1] such that, given a pattern P [0..m - 1], we can quickly report the maximal substrings of P that each occur in T at least k times. We first show how we can add O (r log n) bits to Rossi et al.’s recent MONI index, where r is the number of runs in the Burrows-Wheeler Transform of T, such that it supports such queries in O (k m log n) time. We then show how, if we are given k at construction time, we can reduce the query time to O (m log n).

Subject Classification

ACM Subject Classification

Theory of computation → Pattern matching

Keywords

Compact data structures
Burrows-Wheeler Transform
run-length compression
maximal exact matches

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Hideo Bannai, Travis Gagie, and Tomohiro I. Refining the r-index. Theoretical Computer Science, 812:96-108, 2020.
Christina Boucher, Travis Gagie, Tomohiro I, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Alejandro Pacheco, and Massimiliano Rossi. PHONI: Streamed matching statistics with multi-genome references. In 2021 Data Compression Conference (DCC), pages 193-202. IEEE, 2021.
Christina Boucher, Travis Gagie, Alan Kuhnle, Ben Langmead, Giovanni Manzini, and Taher Mun. Prefix-free parsing for building big BWTs. Algorithms for Molecular Biology, 14(1):1-15, 2019.
Nathaniel K. Brown, Travis Gagie, and Massimiliano Rossi. RLBWT tricks. In 20th Symposium on Experimental Algorithms (SEA 2022), pages 16:1-16:16, 2022.
Travis Gagie, Gonzalo Navarro, and Nicola Prezza. Fully functional suffix trees and optimal text searching in BWT-runs bounded space. Journal of the ACM (JACM), 67(1):1-54, 2020.
Juha Kärkkäinen, Dominik Kempa, and Marcin Piątkowski. Tighter bounds for the sum of irreducible LCP values. Theoretical Computer Science, 656:265-278, 2016.
Juha Kärkkäinen, Giovanni Manzini, and Simon J Puglisi. Permuted longest-common-prefix array. In 20th Symposium on Combinatorial Pattern Matching (CPM), pages 181-192. Springer, 2009.
Heng Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint, 2013. URL: https://arxiv.org/abs/1303.3997.
Gonzalo Navarro. Compact Data Structures: A Practical Approach. Cambridge University Press, 2016.
Gonzalo Navarro. Computing MEMs on repetitive text collections. arXiv preprint v3, 2022. Accepted to this conference. URL: https://arxiv.org/abs/2210.09914.
Takaaki Nishimoto, Shunsuke Kanda, and Yasuo Tabei. An optimal-time RLBWT construction in BWT-runs bounded space. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), pages 99:1-99:20, 2022.
Takaaki Nishimoto and Yasuo Tabei. Optimal-time queries on BWT-runs compressed indexes. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), 2021.
Massimiliano Rossi, Marco Oliva, Ben Langmead, Travis Gagie, and Christina Boucher. MONI: A pangenomic index for finding maximal exact matches. Journal of Computational Biology, 29(2):169-187, 2022.

MONI Can Find k-MEMs

Authors Igor Tatarnikov , Ardavan Shahrabi Farahani, Sana Kashgouli, Travis Gagie

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

MONI Can Find k-MEMs

Authors Igor Tatarnikov , Ardavan Shahrabi Farahani, Sana Kashgouli, Travis Gagie

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message