Document

# The k-Mappability Problem Revisited

## File

LIPIcs.CPM.2021.5.pdf
• Filesize: 0.73 MB
• 20 pages

## Acknowledgements

We warmly thank Tomasz Kociumaka for useful discussions.

## Cite As

Amihood Amir, Itai Boneh, and Eitan Kondratovsky. The k-Mappability Problem Revisited. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 5:1-5:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.CPM.2021.5

## Abstract

The k-mappability problem has two integers parameters m and k. For every subword of size m in a text S, we wish to report the number of indices in S in which the word occurs with at most k mismatches. The problem was lately tackled by Alzamel et al. [Mai Alzamel et al., 2018]. For a text with constant alphabet Σ and k ∈ O(1), they present an algorithm with linear space and O(nlog^{k+1}n) time. For the case in which k = 1 and a constant size alphabet, a faster algorithm with linear space and O(nlog(n)log log(n)) time was presented in [Mai Alzamel et al., 2020]. In this work, we enhance the techniques of [Mai Alzamel et al., 2020] to obtain an algorithm with linear space and O(n log(n)) time for k = 1. Our algorithm removes the constraint of the alphabet being of constant size. We also present linear algorithms for the case of k = 1, |Σ| ∈ O(1) and m = Ω(√n).

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Pattern matching
• Theory of computation → Sorting and searching
##### Keywords
• Pattern Matching
• Hamming Distance
• Suffix Tree
• Suffix Array

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, and Juliusz Straszynski. Efficient computation of sequence mappability. In Travis Gagie, Alistair Moffat, Gonzalo Navarro, and Ernesto Cuadros-Vargas, editors, String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Lima, Peru, October 9-11, 2018, Proceedings, volume 11147 of Lecture Notes in Computer Science, pages 12-26. Springer, 2018. URL: https://doi.org/10.1007/978-3-030-00479-8_2.
2. Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, Jakub Radoszewski, and Wing-Kin Sung. Faster algorithms for 1-mappability of a sequence. Theor. Comput. Sci., 812:2-12, 2020. URL: https://doi.org/10.1016/j.tcs.2019.04.026.
3. S. Bagchi, E. Hung, A. Iyengar, N. G. Vogl, and N. Wadia. Capacity planning tools for web and grid environments. In Proc. 1st International Conference on Performance Evaluation Methodolgies and Tools (VALUETOOLS), 2006. ISBN = 1-59593-504-5, article number 25, http://doi.acm.org/10.1145/1190095.1190127.
4. Farach-Colton M. Bender M.A. The level ancestor problem simplified. Theoretical Computer Science, 321(1):5-12, 2004. Latin American Theoretical Informatics. URL: https://doi.org/10.1016/j.tcs.2003.05.002.
5. L. Foschini, R. Grossi, A. Gupta, and J. S. Vitter. When indexing equals compression: Experiments with compressing suffix arrays and applications. ACM Transactions on Algorithms, 2(4):611-639, 2006.
6. J. Kärkkäinen and P. Sanders. Simple linear work suffix array construction. In Proc. 30th International Colloquium on Automata, Languages and Programming (ICALP 03), pages 943-955, 2003. LNCS 2719.
7. S. Ma and J.L. Hellerstein. Mining partially periodic event patterns with unknown periods. In Proc. 17th International Conference on Data Engineering (ICDE), pages 205-214. IEEE Computer Society, 2001.
8. U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. In Proc. 1st ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 319-327, 1990.
9. V. V. Panteleenko. Instantaneous Offloading of Web Server Loads. PhD thesis, University of Notre Dame, 2002.
10. G. Ritschard, R. Bürgin, and M. Studer. Exploratory mining of life event histories. In J.J.McArdle and G. Ritschard, editors, Contemporary Issues in Exploratory Data Mining in Behavioral Sciences, pages 221-253. Routeledge, New York, 2013.
11. K. Sadakane. A fast algorithm for making suffix arrays and for burrows-wheeler transformation. In Proc. Data Compression Conference (DCC), pages 129-138, 1998.
12. Federal Highway Administration U.S. Department of Transportation. Conjestion: a national issue. http://www.ops.fhwa.dot.gov/aboutus/opstory.htm, August 2011.
13. P. Weiner. Linear pattern matching algorithm. Proc. 14 IEEE Symposium on Switching and Automata Theory, pages 1-11, 1973.