Dynamic String Alignment

Authors Panagiotis Charalampopoulos , Tomasz Kociumaka , Shay Mozes



PDF
Thumbnail PDF

File

LIPIcs.CPM.2020.9.pdf
  • Filesize: 0.5 MB
  • 13 pages

Document Identifiers

Author Details

Panagiotis Charalampopoulos
  • Department of Informatics, King’s College London, UK
  • Institute of Informatics, University of Warsaw, Poland
Tomasz Kociumaka
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Shay Mozes
  • Efi Arazi School of Computer Science, The Interdisciplinary Center Herzliya, Israel

Cite As Get BibTex

Panagiotis Charalampopoulos, Tomasz Kociumaka, and Shay Mozes. Dynamic String Alignment. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 9:1-9:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.CPM.2020.9

Abstract

We consider the problem of dynamically maintaining an optimal alignment of two strings, each of length at most n, as they undergo insertions, deletions, and substitutions of letters. The string alignment problem generalizes the longest common subsequence (LCS) problem and the edit distance problem (also with non-unit costs, as long as insertions and deletions cost the same). The conditional lower bound of Backurs and Indyk [J. Comput. 2018] for computing the LCS in the static case implies that strongly sublinear update time for the dynamic string alignment problem would refute the Strong Exponential Time Hypothesis. We essentially match this lower bound when the alignment weights are constants, by showing how to process each update in 𝒪̃(n) time. When the weights are integers bounded in absolute value by some w=n^{𝒪(1)}, we can maintain the alignment in 𝒪̃(n ⋅ min {√ n,w}) time per update. For the 𝒪̃(nw)-time algorithm, we heavily rely on Tiskin’s work on semi-local LCS, and in particular, in an implicit way, on his algorithm for computing the (min,+)-product of two simple unit-Monge matrices [Algorithmica 2015]. As for the 𝒪̃(n√n)-time algorithm, we employ efficient data structures for computing distances in planar graphs.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • string alignment
  • edit distance
  • longest common subsequence
  • (unit-)Monge matrices
  • (min,+)-product

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Thomas Dueholm Hansen, Virginia Vassilevska Williams, and Ryan Williams. Simulating branching programs with edit distance and friends: or: a polylog shaved is a lower bound made. In 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, pages 375-388. ACM, 2016. URL: https://doi.org/10.1145/2897518.2897653.
  2. Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, pages 377-386. IEEE Computer Society, 2010. URL: https://doi.org/10.1109/FOCS.2010.43.
  3. Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. SIAM Journal on Computing, 41(6):1635-1648, 2012. URL: https://doi.org/10.1137/090767182.
  4. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM Journal on Computing, 47(3):1087-1097, 2018. URL: https://doi.org/10.1137/15M1053128.
  5. Philip Bille and Martin Farach-Colton. Fast and compact regular expression matching. Theoretical Computer Science, 409(3):486-496, 2008. URL: https://doi.org/10.1016/j.tcs.2008.08.042.
  6. Mahdi Boroujeni, Masoud Seddighin, and Saeed Seddighin. Improved algorithms for edit distance and LCS: beyond worst case. In 31st ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1601-1620. SIAM, 2020. URL: https://doi.org/10.1137/1.9781611975994.99.
  7. Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In 56th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2015, pages 79-97. IEEE Computer Society, 2015. URL: https://doi.org/10.1109/FOCS.2015.15.
  8. Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. Approximating edit distance within constant factor in truly sub-quadratic time. In 59th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2018, pages 979-990. IEEE Computer Society, 2018. URL: https://doi.org/10.1109/FOCS.2018.00096.
  9. Timothy M. Chan and Mihai Pătraşcu. Counting inversions, offline orthogonal range counting, and related problems. In 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pages 161-173. SIAM, 2010. URL: https://doi.org/10.1137/1.9781611973075.15.
  10. Panagiotis Charalampopoulos, Shay Mozes, and Benjamin Tebeka. Exact distance oracles for planar graphs with failing vertices. In 30th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 2110-2123. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.127.
  11. Maxime Crochemore, Gad M. Landau, and Michal Ziv-Ukelson. A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM Journal on Computing, 32(6):1654-1673, 2003. URL: https://doi.org/10.1137/S0097539702402007.
  12. Jittat Fakcharoenphol and Satish Rao. Planar graphs, negative weight edges, shortest paths, and near linear time. Journal of Computer and System Sciences, 72(5):868-889, 2006. URL: https://doi.org/10.1016/j.jcss.2005.05.007.
  13. Paweł Gawrychowski and Adam Karczmarz. Improved bounds for shortest paths in dense distance graphs. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, volume 107 of LIPIcs, pages 61:1-61:15. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.61.
  14. Elazar Goldenberg, Robert Krauthgamer, and Barna Saha. Sublinear algorithms for gap edit distance. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, pages 1101-1120. IEEE Computer Society, 2019. URL: https://doi.org/10.1109/FOCS.2019.00070.
  15. Szymon Grabowski. New tabulation and sparse dynamic programming based techniques for sequence similarity problems. Discrete Applied Mathematics, 212:96-103, 2016. URL: https://doi.org/10.1016/j.dam.2015.10.040.
  16. MohammadTaghi Hajiaghayi, Masoud Seddighin, Saeed Seddighin, and Xiaorui Sun. Approximating LCS in linear time: Beating the √n barrier. In 30th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, pages 1181-1200. SIAM, 2019. URL: https://doi.org/10.1137/1.9781611975482.72.
  17. Daniel S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341-343, 1975. URL: https://doi.org/10.1145/360825.360861.
  18. Yusuke Ishida, Shunsuke Inenaga, Ayumi Shinohara, and Masayuki Takeda. Fully incremental LCS computation. In 15th International Symposium on Fundamentals of Computation Theory, FCT 2005, volume 3623 of LNCS, pages 563-574. Springer, 2005. URL: https://doi.org/10.1007/11537311_49.
  19. Giuseppe F. Italiano, Yahav Nussbaum, Piotr Sankowski, and Christian Wulff-Nilsen. Improved algorithms for min cut and max flow in undirected planar graphs. In 43rd ACM Symposium on Theory of Computing, STOC 2011, pages 313-322. ACM, 2011. URL: https://doi.org/10.1145/1993636.1993679.
  20. Haim Kaplan, Shay Mozes, Yahav Nussbaum, and Micha Sharir. Submatrix maximum queries in Monge matrices and partial Monge matrices, and their applications. ACM Transactions on Algorithms, 13(2):26:1-26:42, 2017. URL: https://doi.org/10.1145/3039873.
  21. Sung-Ryul Kim and Kunsoo Park. A dynamic edit distance table. Journal of Discrete Algorithms, 2(2):303-312, 2004. URL: https://doi.org/10.1016/S1570-8667(03)00082-0.
  22. Philip N. Klein. Multiple-source shortest paths in planar graphs. In 16th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2005, pages 146-155. SIAM, 2005. URL: http://dl.acm.org/citation.cfm?id=1070432.1070454.
  23. Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM Journal on Computing, 27(2):557-582, 1998. URL: https://doi.org/10.1137/S0097539794264810.
  24. Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics. Doklady, 10:707-710, 1966. Google Scholar
  25. William J. Masek and Mike Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20(1):18-31, 1980. URL: https://doi.org/10.1016/0022-0000(80)90002-1.
  26. Gaspard Monge. Mémoire sur la théorie des déblais et des remblais. De l'Imprimerie Royale, 1781. Google Scholar
  27. Shay Mozes and Christian Wulff-Nilsen. Shortest paths in planar graphs with real lengths in O(n log² n / log log n) time. In 18th Annual European Symposium on Algorithms, ESA 2010, Part II, volume 6347 of LNCS, pages 206-217. Springer, 2010. URL: https://doi.org/10.1007/978-3-642-15781-3_18.
  28. Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443-453, 1970. URL: https://doi.org/10.1016/0022-2836(70)90057-4.
  29. David Sankoff. Matching sequences under deletion/insertion constraints. Proceedings of the National Academy of Sciences of the United States of America, 69(1):4-6, 1972. URL: https://doi.org/10.1073/pnas.69.1.4.
  30. Peter H. Sellers. On the theory and computation of evolutionary distances. SIAM Journal on Applied Mathematics, 26(4):787-793, 1974. URL: https://doi.org/10.1137/0126070.
  31. Alexander Tiskin. Longest common subsequences in permutations and maximum cliques in circle graphs. In 17th Annual Symposium on Combinatorial Pattern Matching, CPM 2006, volume 4009 of LNCS, pages 270-281. Springer, 2006. URL: https://doi.org/10.1007/11780441_25.
  32. Alexander Tiskin. Semi-local string comparison: algorithmic techniques and applications, 2007. URL: http://arxiv.org/abs/0707.3619.
  33. Alexander Tiskin. Semi-local longest common subsequences in subquadratic time. Journal of Discrete Algorithms, 6(4):570-581, 2008. URL: https://doi.org/10.1016/j.jda.2008.07.001.
  34. Alexander Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Mathematics in Computer Science, 1(4):571-603, 2008. URL: https://doi.org/10.1007/s11786-007-0033-3.
  35. Alexander Tiskin. Faster subsequence recognition in compressed strings. Journal of Mathematical Sciences, 158(5):759-769, 2009. URL: https://doi.org/10.1007/s10958-009-9396-0.
  36. Alexander Tiskin. Periodic string comparison. In 20th Annual Symposium on Combinatorial Pattern Matching, CPM 2009, volume 5577 of LNCS, pages 193-206. Springer, 2009. URL: https://doi.org/10.1007/978-3-642-02441-2_18.
  37. Alexander Tiskin. Fast distance multiplication of unit-Monge matrices. Algorithmica, 71(4):859-888, 2015. URL: https://doi.org/10.1007/s00453-013-9830-z.
  38. Taras K. Vintsyuk. Speech discrimination by dynamic programming. Cybernetics, 4:52-57, 1968. URL: https://doi.org/10.1007/BF01074755.
  39. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM, 21(1):168-173, 1974. URL: https://doi.org/10.1145/321796.321811.
  40. Sun Wu, Udi Manber, and Eugene W. Myers. A subquadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50-67, 1996. URL: https://doi.org/10.1007/BF01942606.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail