Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS

Authors Karl Bringmann, Bhaskar Ray Chaudhury



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2018.40.pdf
  • Filesize: 0.5 MB
  • 16 pages

Document Identifiers

Author Details

Karl Bringmann
  • Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Bhaskar Ray Chaudhury
  • Max Planck Institute for Informatics, Saarland Informatics Campus, Graduate School of Computer Science, Saarbrücken, Germany

Cite AsGet BibTex

Karl Bringmann and Bhaskar Ray Chaudhury. Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS. In 38th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 122, pp. 40:1-40:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.FSTTCS.2018.40

Abstract

We study sketching and streaming algorithms for the Longest Common Subsequence problem (LCS) on strings of small alphabet size |Sigma|. For the problem of deciding whether the LCS of strings x,y has length at least L, we obtain a sketch size and streaming space usage of O(L^{|Sigma| - 1} log L). We also prove matching unconditional lower bounds. As an application, we study a variant of LCS where each alphabet symbol is equipped with a weight that is given as input, and the task is to compute a common subsequence of maximum total weight. Using our sketching algorithm, we obtain an O(min{nm, n + m^{|Sigma|}})-time algorithm for this problem, on strings x,y of length n,m, with n >= m. We prove optimality of this running time up to lower order factors, assuming the Strong Exponential Time Hypothesis.

Subject Classification

ACM Subject Classification
  • Theory of computation → Communication complexity
  • Theory of computation → Design and analysis of algorithms
Keywords
  • algorithms
  • SETH
  • communication complexity
  • run-length encoding

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Quadratic-Time Hardness of LCS and other Sequence Similarity Measures. In Proc. 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS'15), pages 59-78, 2015. Google Scholar
  2. Amir Abboud, Ryan Williams, and Huacheng Yu. More Applications of the Polynomial Method to Algorithm Design. In Proc. 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'15), pages 218-230, 2015. Google Scholar
  3. Jochen Alber, Jens Gramm, Jiong Guo, and Rolf Niedermeier. Towards optimally solving the longest common subsequence problem for sequences with nested arc annotations in linear time. In Proc. 13th Annual Symposium on Combinatorial Pattern Matching (CPM'02), pages 99-114, 2002. Google Scholar
  4. Helmut Alt and Michael Godau. Computing the Fréchet distance between two polygonal curves. Internat. J. Comput. Geom. Appl., 5(1-2):78-99, 1995. Google Scholar
  5. Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-410, 1990. Google Scholar
  6. Amihood Amir, Zvi Gotthilf, and B. Riva Shalom. Weighted LCS. Journal of Discrete Algorithms, 8(3):273-281, 2010. Google Scholar
  7. Hsing-Yen Ann, Chang-Biau Yang, Chiou-Ting Tseng, and Chiou-Yi Hor. A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings. Information Processing Letters, 108(6):360-364, 2008. Google Scholar
  8. Alberto Apostolico. Improving the Worst-Case Performance of the Hunt-Szymanski Strategy for the Longest Common Subsequence of Two Strings. Inf. Process. Lett., 23(2):63-69, 1986. URL: http://dx.doi.org/10.1016/0020-0190(86)90044-X.
  9. Alberto Apostolico and Concettina Guerra. The Longest Common Subsequence Problem Revisited. Algorithmica, 2:316-336, 1987. URL: http://dx.doi.org/10.1007/BF01840365.
  10. Arturs Backurs and Piotr Indyk. Which Regular Expression Patterns are Hard to Match? In Proc. 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS'16), pages 457-466, 2016. Google Scholar
  11. Djamal Belazzougui and Qin Zhang. Edit Distance: Sketching, Streaming, and Document Exchange. In Proc. 57th Annual Symposium on Foundations of Computer Science, pages 51-60, 2016. Google Scholar
  12. Karl Bringmann. Why Walking the Dog Takes Time: Frechet Distance Has No Strongly Subquadratic Algorithms Unless SETH Fails. In Proc. 55th Annual IEEE Symposium on Foundations of Computer Science (FOCS'14), pages 661-670, 2014. URL: http://dx.doi.org/10.1109/FOCS.2014.76.
  13. Karl Bringmann and Marvin Künnemann. Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping. In Proc. 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS'15), pages 79-97, 2015. Google Scholar
  14. Karl Bringmann and Marvin Künnemann. Multivariate Fine-Grained Complexity of Longest Common Subsequence. In Proc. 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'18), 2018. To appear. Google Scholar
  15. Mauro Castelli, Riccardo Dondi, Giancarlo Mauri, and Italo Zoppis. The Longest Filled Common Subsequence Problem. In Proc. 28th Annual Symposium on Combinatorial Pattern Matching (CPM'17), 2017. To appear. Google Scholar
  16. Diptarka Chakraborty, Elazar Goldenberg, and Michal Kouckỳ. Streaming algorithms for computing edit distance without exploiting suffix trees. ArXiv:1607.03718, 2016. Google Scholar
  17. Wun-Tat Chan, Yong Zhang, Stanley P.Y. Fung, Deshi Ye, and Hong Zhu. Efficient algorithms for finding a longest common increasing subsequence. Journal of Combinatorial Optimization, 13(3):277-288, 2007. Google Scholar
  18. Marek Cygan, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Polynomial-time approximation algorithms for weighted LCS problem. Discrete Applied Mathematics, 204:38-48, 2016. Google Scholar
  19. David Eppstein, Zvi Galil, Raffaele Giancarlo, and Giuseppe F. Italiano. Sparse Dynamic Programming I: Linear Cost Functions. J. ACM, 39(3):519-545, July 1992. URL: http://dx.doi.org/10.1145/146637.146650.
  20. Valerio Freschi and Alessandro Bogliolo. Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism. Information Processing Letters, 90(4):167-173, 2004. Google Scholar
  21. Zvi Gotthilf, Danny Hermelin, Gad M. Landau, and Moshe Lewenstein. Restricted LCS. In Proc. 17th International Conference on String Processing and Information Retrieval (SPIRE'10), pages 250-257, 2010. Google Scholar
  22. Daniel S. Hirschberg. Algorithms for the Longest Common Subsequence Problem. J. ACM, 24(4):664-675, 1977. URL: http://dx.doi.org/10.1145/322033.322044.
  23. J. W. Hunt and M. D. McIlroy. An algorithm for differential file comparison. Computing Science Technical Report 41, Bell Laboratories, 1975. Google Scholar
  24. James W. Hunt and Thomas G. Szymanski. A Fast Algorithm for Computing Longest Subsequences. Commun. ACM, 20(5):350-353, 1977. URL: http://dx.doi.org/10.1145/359581.359603.
  25. Costas S Iliopoulos, Marcin Kubica, M Sohel Rahman, and Tomasz Waleń. Algorithms for computing the longest parameterized common subsequence. In Proc. 18th Annual Conference on Combinatorial Pattern Matching (CPM'07), pages 265-273, 2007. Google Scholar
  26. Costas S. Iliopoulos and M. Sohel Rahman. A New Efficient Algorithm for Computing the Longest Common Subsequence. Theory of Computing Systems, 45(2):355-371, 2009. URL: http://dx.doi.org/10.1007/s00224-008-9101-6.
  27. Russell Impagliazzo, Ramamohan Paturi, and Francis Zane. Which Problems Have Strongly Exponential Complexity? J. Computer and System Sciences, 63(4):512-530, 2001. Google Scholar
  28. Tao Jiang, Guohui Lin, Bin Ma, and Kaizhong Zhang. The longest common subsequence problem for arc-annotated sequences. Journal of Discrete Algorithms, 2(2):257-270, 2004. Google Scholar
  29. Hossein Jowhari. Efficient communication protocols for deciding edit distance. In Proc. European Symposium on Algorithms, pages 648-658, 2012. Google Scholar
  30. Orgad Keller, Tsvi Kopelowitz, and Moshe Lewenstein. On the longest common parameterized subsequence. Theoretical Computer Science, 410(51):5347-5353, 2009. Google Scholar
  31. Ilan Kremer, Noam Nisan, and Dana Ron. On Randomized One-Round Communication Complexity. Computational Complexity, 8(1):21-49, 1999. Google Scholar
  32. Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster STR-IC-LCS computation via RLE. In Proc. 28th Annual Symposium on Combinatorial Pattern Matching (CPM'17), 2017. To appear, arXiv:1703.04954. Google Scholar
  33. Martin Kutz, Gerth Stølting Brodal, Kanela Kaligosi, and Irit Katriel. Faster algorithms for computing longest common increasing subsequences. Journal of Discrete Algorithms, 9(4):314-325, 2011. Google Scholar
  34. Gad M. Landau, Baruch Schieber, and Michal Ziv-Ukelson. Sparse LCS common substring alignment. Information Processing Letters, 88(6):259-270, 2003. Google Scholar
  35. David Liben-Nowell, Erik Vee, and An Zhu. Finding longest increasing and common subsequences in streaming data. Journal of Combinatorial Optimization, 11(2):155-175, 2006. Google Scholar
  36. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out, 2004. Google Scholar
  37. Jia Jie Liu, Yue-Li Wang, and Richard CT Lee. Finding a longest common subsequence between a run-length-encoded string and an uncompressed string. Journal of Complexity, 24(2):173-184, 2008. Google Scholar
  38. Webb Miller and Eugene W. Myers. A File Comparison Program. Softw., Pract. Exper., 15(11):1025-1040, 1985. URL: http://dx.doi.org/10.1002/spe.4380151102.
  39. Johra Muhammad Moosa, M. Sohel Rahman, and Fatema Tuz Zohora. Computing a longest common subsequence that is almost increasing on sequences having no repeated elements. Journal of Discrete Algorithms, 20:12-20, 2013. Google Scholar
  40. Howard L. Morgan. Spelling correction in systems programs. Communications of the ACM, 13(2):90-94, 1970. Google Scholar
  41. Shay Mozes, Dekel Tsur, Oren Weimann, and Michal Ziv-Ukelson. Fast algorithms for computing tree LCS. Theoretical Computer Science, 410(43):4303-4314, 2009. Google Scholar
  42. Eugene W. Myers. An O(ND) Difference Algorithm and Its Variations. Algorithmica, 1(2):251-266, 1986. URL: http://dx.doi.org/10.1007/BF01840446.
  43. Gene Myers. A four russians algorithm for regular expression pattern matching. Journal of the ACM (JACM), 39(2):432-448, 1992. Google Scholar
  44. Narao Nakatsu, Yahiko Kambayashi, and Shuzo Yajima. A Longest Common Subsequence Algorithm Suitable for Similar Text Strings. Acta Inf., 18:171-179, 1982. URL: http://dx.doi.org/10.1007/BF00264437.
  45. Pavel Pevzner and Michael Waterman. Matrix longest common subsequence problem, duality and Hilbert bases. In Proc. 3th Annual Symposium on Combinatorial Pattern Matching (CPM'92), pages 79-89, 1992. Google Scholar
  46. Michael Saks and C. Seshadhri. Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance. In Proc. 24th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1698-1709, 2013. Google Scholar
  47. Xiaoming Sun and David P. Woodruff. The communication and streaming complexity of computing the longest common and increasing subsequences. In Proc. 18th Annual ACM-SIAM Aymposium on Discrete Algorithms, pages 336-345, 2007. Google Scholar
  48. Alexander Tiskin. Longest common subsequences in permutations and maximum cliques in circle graphs. In Proc. 17th Annual Symposium on Combinatorial Pattern Matching (CPM'06), pages 270-281, 2006. Google Scholar
  49. Robert A. Wagner and Michael J. Fischer. The String-to-String Correction Problem. J. ACM, 21(1):168-173, 1974. URL: http://dx.doi.org/10.1145/321796.321811.
  50. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theoretical Computer Science, 348(2):357-365, 2005. Google Scholar
  51. Sun Wu, Udi Manber, Gene Myers, and Webb Miller. An O(NP) Sequence Comparison Algorithm. Inf. Process. Lett., 35(6):317-323, 1990. URL: http://dx.doi.org/10.1016/0020-0190(90)90035-V.
  52. I-Hsuan Yang, Chien-Pin Huang, and Kun-Mao Chao. A fast algorithm for computing a longest common increasing subsequence. Information Processing Letters, 93(5):249-253, 2005. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail