The k-Mappability Problem Revisited

Authors Amihood Amir, Itai Boneh, Eitan Kondratovsky



PDF
Thumbnail PDF

File

LIPIcs.CPM.2021.5.pdf
  • Filesize: 0.73 MB
  • 20 pages

Document Identifiers

Author Details

Amihood Amir
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
  • Georgia Tech, Atlanta, GA, USA
Itai Boneh
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Eitan Kondratovsky
  • Department of Computer Science, Bar Ilan University, Ramat Gan, Israel
  • Cheriton School of Computer Science, Waterloo University, Waterloo, Canada

Acknowledgements

We warmly thank Tomasz Kociumaka for useful discussions.

Cite AsGet BibTex

Amihood Amir, Itai Boneh, and Eitan Kondratovsky. The k-Mappability Problem Revisited. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 5:1-5:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.CPM.2021.5

Abstract

The k-mappability problem has two integers parameters m and k. For every subword of size m in a text S, we wish to report the number of indices in S in which the word occurs with at most k mismatches. The problem was lately tackled by Alzamel et al. [Mai Alzamel et al., 2018]. For a text with constant alphabet Σ and k ∈ O(1), they present an algorithm with linear space and O(nlog^{k+1}n) time. For the case in which k = 1 and a constant size alphabet, a faster algorithm with linear space and O(nlog(n)log log(n)) time was presented in [Mai Alzamel et al., 2020]. In this work, we enhance the techniques of [Mai Alzamel et al., 2020] to obtain an algorithm with linear space and O(n log(n)) time for k = 1. Our algorithm removes the constraint of the alphabet being of constant size. We also present linear algorithms for the case of k = 1, |Σ| ∈ O(1) and m = Ω(√n).

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Sorting and searching
Keywords
  • Pattern Matching
  • Hamming Distance
  • Suffix Tree
  • Suffix Array

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, and Juliusz Straszynski. Efficient computation of sequence mappability. In Travis Gagie, Alistair Moffat, Gonzalo Navarro, and Ernesto Cuadros-Vargas, editors, String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Lima, Peru, October 9-11, 2018, Proceedings, volume 11147 of Lecture Notes in Computer Science, pages 12-26. Springer, 2018. URL: https://doi.org/10.1007/978-3-030-00479-8_2.
  2. Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, Jakub Radoszewski, and Wing-Kin Sung. Faster algorithms for 1-mappability of a sequence. Theor. Comput. Sci., 812:2-12, 2020. URL: https://doi.org/10.1016/j.tcs.2019.04.026.
  3. S. Bagchi, E. Hung, A. Iyengar, N. G. Vogl, and N. Wadia. Capacity planning tools for web and grid environments. In Proc. 1st International Conference on Performance Evaluation Methodolgies and Tools (VALUETOOLS), 2006. ISBN = 1-59593-504-5, article number 25, http://doi.acm.org/10.1145/1190095.1190127. Google Scholar
  4. Farach-Colton M. Bender M.A. The level ancestor problem simplified. Theoretical Computer Science, 321(1):5-12, 2004. Latin American Theoretical Informatics. URL: https://doi.org/10.1016/j.tcs.2003.05.002.
  5. L. Foschini, R. Grossi, A. Gupta, and J. S. Vitter. When indexing equals compression: Experiments with compressing suffix arrays and applications. ACM Transactions on Algorithms, 2(4):611-639, 2006. Google Scholar
  6. J. Kärkkäinen and P. Sanders. Simple linear work suffix array construction. In Proc. 30th International Colloquium on Automata, Languages and Programming (ICALP 03), pages 943-955, 2003. LNCS 2719. Google Scholar
  7. S. Ma and J.L. Hellerstein. Mining partially periodic event patterns with unknown periods. In Proc. 17th International Conference on Data Engineering (ICDE), pages 205-214. IEEE Computer Society, 2001. Google Scholar
  8. U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. In Proc. 1st ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 319-327, 1990. Google Scholar
  9. V. V. Panteleenko. Instantaneous Offloading of Web Server Loads. PhD thesis, University of Notre Dame, 2002. Google Scholar
  10. G. Ritschard, R. Bürgin, and M. Studer. Exploratory mining of life event histories. In J.J.McArdle and G. Ritschard, editors, Contemporary Issues in Exploratory Data Mining in Behavioral Sciences, pages 221-253. Routeledge, New York, 2013. Google Scholar
  11. K. Sadakane. A fast algorithm for making suffix arrays and for burrows-wheeler transformation. In Proc. Data Compression Conference (DCC), pages 129-138, 1998. Google Scholar
  12. Federal Highway Administration U.S. Department of Transportation. Conjestion: a national issue. http://www.ops.fhwa.dot.gov/aboutus/opstory.htm, August 2011. Google Scholar
  13. P. Weiner. Linear pattern matching algorithm. Proc. 14 IEEE Symposium on Switching and Automata Theory, pages 1-11, 1973. Google Scholar