Longest Common Substring Made Fully Dynamic

Authors Amihood Amir, Panagiotis Charalampopoulos , Solon P. Pissis , Jakub Radoszewski



PDF
Thumbnail PDF

File

LIPIcs.ESA.2019.6.pdf
  • Filesize: 0.58 MB
  • 17 pages

Document Identifiers

Author Details

Amihood Amir
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Panagiotis Charalampopoulos
  • Department of Informatics, King’s College London, London, UK, Efi Arazi School of Computer Science, The Interdisciplinary Center Herzliya, Herzliya, Israel
Solon P. Pissis
  • CWI, Amsterdam, The Netherlands
Jakub Radoszewski
  • Institute of Informatics, University of Warsaw, Warsaw, Poland, Samsung R&D Institute, Warsaw, Poland

Cite AsGet BibTex

Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest Common Substring Made Fully Dynamic. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 6:1-6:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ESA.2019.6

Abstract

Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an O(n)-time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to this problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in O~(n^(2/3)) time, after O~(n)-time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, they presented an O~(n)-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in O~(1) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings. We show that the techniques we develop can be applied to obtain fully dynamic algorithms for all of these variants. The only previously known sublinear-time dynamic algorithms for problems on strings were for maintaining a dynamic collection of strings for comparison queries and for pattern matching, with the most recent advances made by Gawrychowski et al. [SODA 2018] and by Clifford et al. [STACS 2018]. As an intermediate problem we consider computing the solution for a string with a given set of k edits, which leads us, in particular, to answering internal queries on a string. The input to such a query is specified by a substring (or substrings) of a given string. Data structures for answering internal string queries that were proposed by Kociumaka et al. [SODA 2015] and by Gagie et al. [CCCG 2013] are used, along with new ones, based on ingredients such as the suffix tree, heavy-path decomposition, orthogonal range queries, difference covers, and string periodicity.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • longest common substring
  • string algorithms
  • dynamic algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Richard Ryan Williams, and Huacheng Yu. More Applications of the Polynomial Method to Algorithm Design. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 218-230. SIAM, 2015. URL: https://doi.org/10.1137/1.9781611973730.17.
  2. Amir Abboud, Virginia Vassilevska Williams, and Oren Weimann. Consequences of Faster Alignment of Sequences. In Javier Esparza, Pierre Fraigniaud, Thore Husfeldt, and Elias Koutsoupias, editors, Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I, volume 8572 of Lecture Notes in Computer Science, pages 39-51. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_4.
  3. Paniz Abedin, Sahar Hooshmand, Arnab Ganguly, and Sharma V. Thankachan. The Heaviest Induced Ancestors Problem Revisited. In Gonzalo Navarro, David Sankoff, and Binhai Zhu, editors, Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, volume 105 of LIPIcs, pages 20:1-20:13. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.20.
  4. Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. Pattern Matching in Dynamic Texts. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '00, pages 819-828, Philadelphia, PA, USA, 2000. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=338219.338645.
  5. Amihood Amir and Itai Boneh. Locally Maximal Common Factors as a Tool for Efficient Dynamic String Algorithms. In Gonzalo Navarro, David Sankoff, and Binhai Zhu, editors, Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, volume 105 of LIPIcs, pages 11:1-11:13. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.11.
  6. Amihood Amir and Itai Boneh. Dynamic Palindrome Detection. CoRR, abs/1906.09732, 2019. URL: http://arxiv.org/abs/1906.09732.
  7. Amihood Amir, Itai Boneh, Panagiotis Charalampopoulos, and Eitan Kondratovsky. Repetition Detection in a Dynamic String. In Michael A. Bender, Ola Svensson, and Grzegorz Herman, editors, 27th Annual European Symposium on Algorithms, ESA 2019, September 9-13, 2019, Munich, Germany, LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019. Google Scholar
  8. Amihood Amir, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, and Jakub Radoszewski. Longest Common Factor After One Edit Operation. In Gabriele Fici, Marinella Sciortino, and Rossano Venturini, editors, String Processing and Information Retrieval: 24th International Symposium, SPIRE 2017, Palermo, Italy, September 26-29, 2017, Proceedings, volume 10508 of Lecture Notes in Computer Science, pages 14-26. Springer International Publishing, 2017. URL: https://doi.org/10.1007/978-3-319-67428-5_2.
  9. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. Journal of Algorithms, 50(2):257-275, 2004. URL: https://doi.org/10.1016/S0196-6774(03)00097-X.
  10. Amihood Amir, Moshe Lewenstein, and Sharma V. Thankachan. Range LCP Queries Revisited. In Costas S. Iliopoulos, Simon J. Puglisi, and Emine Yilmaz, editors, String Processing and Information Retrieval - 22nd International Symposium, SPIRE 2015, London, UK, September 1-4, 2015, Proceedings, volume 9309 of Lecture Notes in Computer Science, pages 350-361. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-23826-5_33.
  11. Alberto Apostolico, Maxime Crochemore, Martin Farach-Colton, Zvi Galil, and S. Muthukrishnan. Forty Years of Text Indexing. In Johannes Fischer and Peter Sanders, editors, Combinatorial Pattern Matching, 24th Annual Symposium, CPM 2013, Bad Herrenalb, Germany, June 17-19, 2013. Proceedings, volume 7922 of Lecture Notes in Computer Science, pages 1-10. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-38905-4_1.
  12. Lorraine A. K. Ayad, Carl Barton, Panagiotis Charalampopoulos, Costas S. Iliopoulos, and Solon P. Pissis. Longest Common Prefixes with k-Errors and Applications. In Travis Gagie, Alistair Moffat, Gonzalo Navarro, and Ernesto Cuadros-Vargas, editors, String Processing and Information Retrieval - 25th International Symposium, SPIRE 2018, Lima, Peru, October 9-11, 2018, Proceedings, volume 11147 of Lecture Notes in Computer Science, pages 27-41. Springer, 2018. URL: https://doi.org/10.1007/978-3-030-00479-8_3.
  13. Hélène Barcelo. On the action of the symmetric group on the Free Lie Algebra and the partition lattice. Journal of Combinatorial Theory, Series A, 55(1):93-129, 1990. URL: https://doi.org/10.1016/0097-3165(90)90050-7.
  14. Michael A. Bender and Martin Farach-Colton. The LCA Problem Revisited. In Gaston H. Gonnet, Daniel Panario, and Alfredo Viola, editors, LATIN 2000: Theoretical Informatics, 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000, Proceedings, volume 1776 of Lecture Notes in Computer Science, pages 88-94. Springer, 2000. URL: https://doi.org/10.1007/10719839_9.
  15. Stefan Burkhardt and Juha Kärkkäinen. Fast Lightweight Suffix Array Construction and Checking. In Ricardo A. Baeza-Yates, Edgar Chávez, and Maxime Crochemore, editors, Combinatorial Pattern Matching, CPM 2003, Morelia, Michocán, Mexico, June 25-27, 2003, volume 2676 of Lecture Notes in Computer Science, pages 55-69. Springer, 2003. URL: https://doi.org/10.1007/3-540-44888-8_5.
  16. Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Linear-Time Algorithm for Long LCF with k Mismatches. In Gonzalo Navarro, David Sankoff, and Binhai Zhu, editors, Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, volume 105 of LIPIcs, pages 23:1-23:16. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.23.
  17. Peter Clifford and Raphaël Clifford. Simple deterministic wildcard matching. Information Processing Letters, 101(2):53-54, 2007. URL: https://doi.org/10.1016/j.ipl.2006.08.002.
  18. Raphaël Clifford, Allan Grønlund, Kasper Green Larsen, and Tatiana A. Starikovskaya. Upper and Lower Bounds for Dynamic Data Structures on Strings. In Rolf Niedermeier and Brigitte Vallée, editors, 35th Symposium on Theoretical Aspects of Computer Science, STACS 2018, February 28 to March 3, 2018, Caen, France, volume 96 of LIPIcs, pages 22:1-22:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.STACS.2018.22.
  19. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on Strings. Cambridge University Press, 2007. Google Scholar
  20. Maxime Crochemore, Costas S. Iliopoulos, Manal Mohamed, and Marie-France Sagot. Longest repeats with a block of k don't cares. Theoretical Computer Science, 362(1-3):248-254, 2006. URL: https://doi.org/10.1016/j.tcs.2006.06.029.
  21. Martin Farach. Optimal Suffix Tree Construction with Large Alphabets. In 38th Annual Symposium on Foundations of Computer Science, FOCS '97, Miami Beach, Florida, USA, October 19-22, 1997, pages 137-143. IEEE Computer Society, 1997. URL: https://doi.org/10.1109/SFCS.1997.646102.
  22. Paolo Ferragina. Dynamic Text Indexing under String Updates. Journal of Algorithms, 22(2):296-328, 1997. URL: https://doi.org/10.1006/jagm.1996.0814.
  23. Paolo Ferragina and Roberto Grossi. Optimal On-Line Search and Sublinear Time Update in String Matching. SIAM Journal on Computing, 27(3):713-736, 1998. URL: https://doi.org/10.1137/S0097539795286119.
  24. Johannes Fischer, Dominik Köppl, and Florian Kurpicz. On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching. In Roberto Grossi and Moshe Lewenstein, editors, 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel, volume 54 of LIPIcs, pages 26:1-26:11. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016. URL: https://doi.org/10.4230/LIPIcs.CPM.2016.26.
  25. Tomás Flouri, Emanuele Giaquinta, Kassian Kobert, and Esko Ukkonen. Longest common substrings with k mismatches. Information Processing Letters, 115(6-8):643-647, 2015. Google Scholar
  26. Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest substring palindrome after edit. In Gonzalo Navarro, David Sankoff, and Binhai Zhu, editors, Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, volume 105 of LIPIcs, pages 12:1-12:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.12.
  27. Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster Queries for Longest Substring Palindrome After Block Edit. In Nadia Pisanti and Solon P. Pissis, editors, 30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019, June 18-20, 2019, Pisa, Italy, volume 128 of LIPIcs, pages 27:1-27:13. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.CPM.2019.27.
  28. Travis Gagie, Paweł Gawrychowski, and Yakov Nekrich. Heaviest Induced Ancestors and Longest Common Substrings. In Proceedings of the 25th Canadian Conference on Computational Geometry, CCCG 2013, Waterloo, Ontario, Canada, August 8-10, 2013. Carleton University, Ottawa, Canada, 2013. URL: http://cccg.ca/proceedings/2013/papers/paper_29.pdf.
  29. Paweł Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Łącki, and Piotr Sankowski. Optimal Dynamic Strings. CoRR, abs/1511.02612, 2015. URL: http://arxiv.org/abs/1511.02612.
  30. Paweł Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Łącki, and Piotr Sankowski. Optimal Dynamic Strings. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 1509-1528. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.99.
  31. Ming Gu, Martin Farach, and Richard Beigel. An Efficient Algorithm for Dynamic Text Indexing. In Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '94, pages 697-704, Philadelphia, PA, USA, 1994. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=314464.314675.
  32. Russell Impagliazzo and Ramamohan Paturi. On the Complexity of k-SAT. Journal of Computer and System Sciences, 62(2):367-375, 2001. URL: https://doi.org/10.1006/jcss.2000.1727.
  33. Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which Problems Have Strongly Exponential Complexity? Journal of Computer and System Sciences, 63(4):512-530, December 2001. URL: https://doi.org/10.1006/jcss.2001.1774.
  34. Tomasz Kociumaka. Efficient Data Structures for Internal Queries in Texts. PhD thesis, University of Warsaw, October 2018. URL: https://www.mimuw.edu.pl/~kociumaka/files/phd.pdf.
  35. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal Pattern Matching Queries in a Text and Applications. In Piotr Indyk, editor, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4-6, 2015, pages 532-551. SIAM, 2015. URL: https://doi.org/10.1137/1.9781611973730.36.
  36. Tomasz Kociumaka, Jakub Radoszewski, and Tatiana A. Starikovskaya. Longest Common Substring with Approximately k Mismatches. Algorithmica, 81(6):2633-2652, 2019. URL: https://doi.org/10.1007/s00453-019-00548-x.
  37. Tomasz Kociumaka, Tatiana A. Starikovskaya, and Hjalte Wedel Vildhøj. Sublinear Space Algorithms for the Longest Common Substring Problem. In Andreas S. Schulz and Dorothea Wagner, editors, Algorithms - ESA 2014 - 22th Annual European Symposium, Wroclaw, Poland, September 8-10, 2014. Proceedings, volume 8737 of Lecture Notes in Computer Science, pages 605-617. Springer, 2014. URL: https://doi.org/10.1007/978-3-662-44777-2_50.
  38. Roger C. Lyndon. On Burnside’s problem. Transactions of the American Mathematical Society, 77:202-215, 1954. Google Scholar
  39. Mamoru Maekawa. A √n Algorithm for Mutual Exclusion in Decentralized Systems. ACM Transactions on Computer Systems, 3(2):145-159, 1985. URL: https://doi.org/10.1145/214438.214445.
  40. Kurt Mehlhorn, R. Sundar, and Christian Uhrig. Maintaining Dynamic Sequences under Equality Tests in Polylogarithmic Time. Algorithmica, 17(2):183-198, 1997. URL: https://doi.org/10.1007/BF02522825.
  41. Süleyman Cenk Sahinalp and Uzi Vishkin. Efficient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm (extended abstract). In 37th Annual Symposium on Foundations of Computer Science, FOCS '96, Burlington, Vermont, USA, 14-16 October, 1996, pages 320-328. IEEE Computer Society, 1996. URL: https://doi.org/10.1109/SFCS.1996.548491.
  42. Daniel D. Sleator and Robert Endre Tarjan. A Data Structure for Dynamic Trees. Journal of Computer and System Sciences, 26(3):362-391, June 1983. URL: https://doi.org/10.1016/0022-0000(83)90006-5.
  43. Tatiana A. Starikovskaya. Longest Common Substring with Approximately k Mismatches. In Roberto Grossi and Moshe Lewenstein, editors, 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel, volume 54 of LIPIcs, pages 21:1-21:11. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016. URL: https://doi.org/10.4230/LIPIcs.CPM.2016.21.
  44. Tatiana A. Starikovskaya and Hjalte Wedel Vildhøj. Time-Space Trade-Offs for the Longest Common Substring Problem. In Johannes Fischer and Peter Sanders, editors, Combinatorial Pattern Matching, 24th Annual Symposium, CPM 2013, Bad Herrenalb, Germany, June 17-19, 2013. Proceedings, volume 7922 of Lecture Notes in Computer Science, pages 223-234. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-38905-4_22.
  45. Rajamani Sundar and Robert Endre Tarjan. Unique Binary-Search-Tree Representations and Equality Testing of Sets and Sequences. SIAM Journal on Computing, 23(1):24-44, 1994. URL: https://doi.org/10.1137/S0097539790189733.
  46. Sharma V. Thankachan, Chaitanya Aluru, Sriram P. Chockalingam, and Srinivas Aluru. Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis. In Benjamin J. Raphael, editor, Research in Computational Molecular Biology - 22nd Annual International Conference, RECOMB 2018, Paris, France, April 21-24, 2018, Proceedings, volume 10812 of Lecture Notes in Computer Science, pages 211-224. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-89929-9_14.
  47. Sharma V. Thankachan, Alberto Apostolico, and Srinivas Aluru. A Provably Efficient Algorithm for the k-Mismatch Average Common Substring Problem. Journal of Computational Biology, 23(6):472-482, 2016. URL: https://doi.org/10.1089/cmb.2015.0235.
  48. Yuki Urabe, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Longest Lyndon Substring After Edit. In Gonzalo Navarro, David Sankoff, and Binhai Zhu, editors, Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China, volume 105 of LIPIcs, pages 19:1-19:10. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.19.
  49. Peter Weiner. Linear Pattern Matching Algorithms. In 14th Annual Symposium on Switching and Automata Theory, Iowa City, Iowa, USA, October 15-17, 1973, pages 1-11. IEEE Computer Society, 1973. URL: https://doi.org/10.1109/SWAT.1973.13.