Longest Common Subsequence on Weighted Sequences

Authors Evangelos Kipouridis , Kostas Tsichlas



PDF
Thumbnail PDF

File

LIPIcs.CPM.2020.19.pdf
  • Filesize: 475 kB
  • 15 pages

Document Identifiers

Author Details

Evangelos Kipouridis
  • Basic Algorithms Research Copenhagen (BARC), University of Copenhagen, Denmark
Kostas Tsichlas
  • Computer Engineering and Informatics Department, University of Patras, Greece

Acknowledgements

We would like to thank the anonymous reviewers for their careful reading of our paper and their many insightful comments and suggestions.

Cite As Get BibTex

Evangelos Kipouridis and Kostas Tsichlas. Longest Common Subsequence on Weighted Sequences. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 19:1-19:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.CPM.2020.19

Abstract

We consider the general problem of the Longest Common Subsequence (LCS) on weighted sequences. Weighted sequences are an extension of classical strings, where in each position every letter of the alphabet may occur with some probability. Previous results presented a PTAS and noticed that no FPTAS is possible unless P=NP. In this paper we essentially close the gap between upper and lower bounds by improving both. First of all, we provide an EPTAS for bounded alphabets (which is the most natural case), and prove that there does not exist any EPTAS for unbounded alphabets unless FPT=W[1]. Furthermore, under the Exponential Time Hypothesis, we provide a lower bound which shows that no significantly better PTAS can exist for unbounded alphabets. As a side note, we prove that it is sufficient to work with only one threshold in the general variant of the problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
  • Theory of computation → W hierarchy
  • Theory of computation → Problems, reductions and completeness
Keywords
  • WLCS
  • LCS
  • weighted sequences
  • approximation algorithms
  • lower bound

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78, 2015. URL: https://doi.org/10.1109/FOCS.2015.14.
  2. Amihood Amir, Eran Chencinski, Costas S. Iliopoulos, Tsvi Kopelowitz, and Hui Zhang. Property matching and weighted matching. Theoretical Computer Science, 395(2-3):298-310, 2008. URL: https://doi.org/10.1016/j.tcs.2008.01.006.
  3. Amihood Amir, Zvi Gotthilf, and B. Riva Shalom. Weighted LCS. Journal of Discrete Algorithms, 8(3):273-281, 2010. URL: https://doi.org/10.1016/j.jda.2010.02.001.
  4. Amihood Amir, Zvi Gotthilf, and B. Riva Shalom. Weighted shortest common supersequence. In String Processing and Information Retrieval, 18th International Symposium, SPIRE 2011, Pisa, Italy, October 17-21, 2011. Proceedings, pages 44-54, 2011. URL: https://doi.org/10.1007/978-3-642-24583-1_6.
  5. Amihood Amir, Tzvika Hartman, Oren Kapah, B. Riva Shalom, and Dekel Tsur. Generalized LCS. Theoretical Computer Science, 409(3):438-449, 2008. URL: https://doi.org/10.1016/j.tcs.2008.08.037.
  6. Amihood Amir, Costas S. Iliopoulos, Oren Kapah, and Ely Porat. Approximate matching in weighted sequences. In Combinatorial Pattern Matching, 17th Annual Symposium, CPM 2006, Barcelona, Spain, July 5-7, 2006, Proceedings, pages 365-376, 2006. URL: https://doi.org/10.1007/11780441_33.
  7. Pavlos Antoniou, Costas S. Iliopoulos, Laurent Mouchard, and Solon P. Pissis. Algorithms for mapping short degenerate and weighted sequences to a reference genome. International Journal of Computational Biology and Drug Design, 2(4):385-397, 2009. URL: https://doi.org/10.1504/IJCBDD.2009.030768.
  8. Alberto Apostolico, Gad M. Landau, and Steven Skiena. Matching for run-length encoded strings. Journal of Complexity, 15(1):4-16, 1999. URL: https://doi.org/10.1006/jcom.1998.0493.
  9. Carl Barton, Costas S. Iliopoulos, and Solon P. Pissis. Optimal computation of all tandem repeats in a weighted sequence. Algorithms for Molecular Biology, 9:21, 2014. URL: https://doi.org/10.1186/s13015-014-0021-5.
  10. Carl Barton, Tomasz Kociumaka, Chang Liu, Solon P. Pissis, and Jakub Radoszewski. Indexing weighted sequences: Neat and efficient. Information and Computation, 270, 2020. URL: https://doi.org/10.1016/j.ic.2019.104462.
  11. Carl Barton and Solon P. Pissis. Crochemore’s partitioning on weighted strings and applications. Algorithmica, 80(2):496-514, 2018. URL: https://doi.org/10.1007/s00453-016-0266-0.
  12. Jon Louis Bentley, Dorothea Haken, and James B. Saxe. A general method for solving divide-and-conquer recurrences. SIGACT News, 12(3):36-44, September 1980. URL: https://doi.org/10.1145/1008861.1008865.
  13. Lasse Bergroth, Harri Hakonen, and Timo Raita. A survey of longest common subsequence algorithms. In Seventh International Symposium on String Processing and Information Retrieval, SPIRE 2000, A Coru~na, Spain, September 27-29, 2000, pages 39-48, 2000. URL: https://doi.org/10.1109/SPIRE.2000.878178.
  14. Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 1216-1235, 2018. URL: https://doi.org/10.1137/1.9781611975031.79.
  15. Marco Cesati. Perfect code is W[1]-complete. Information Processing Letters, 81(3):163-168, 2002. URL: https://doi.org/10.1016/S0020-0190(01)00207-1.
  16. Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, and Jakub Radoszewski. On-line weighted pattern matching. Information and Computation, 266:49-59, 2019. URL: https://doi.org/10.1016/j.ic.2019.01.001.
  17. Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszynski, Tomasz Walen, and Wiktor Zuba. Weighted shortest common supersequence problem revisited. In Nieves R. Brisaboa and Simon J. Puglisi, editors, String Processing and Information Retrieval - 26th International Symposium, SPIRE 2019, Segovia, Spain, October 7-9, 2019, Proceedings, volume 11811 of Lecture Notes in Computer Science, pages 221-238. Springer, 2019. URL: https://doi.org/10.1007/978-3-030-32686-9_16.
  18. Manolis Christodoulakis, Costas S. Iliopoulos, Laurent Mouchard, Katerina Perdikuri, Athanasios K. Tsakalidis, and Kostas Tsichlas. Computation of repetitions and regularities of biologically weighted sequences. Journal of Computational Biology, 13(6):1214-1231, 2006. URL: https://doi.org/10.1089/cmb.2006.13.1214.
  19. Marek Cygan, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Polynomial-time approximation algorithms for weighted LCS problem. Discrete Applied Mathematics, 204:38-48, 2016. URL: https://doi.org/10.1016/j.dam.2015.11.011.
  20. Michael L. Fredman and Dan E. Willard. BLASTING through the information theoretic barrier with FUSION TREES. In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, May 13-17, 1990, Baltimore, Maryland, USA, pages 1-7, 1990. URL: https://doi.org/10.1145/100216.100217.
  21. Martin Fürer. Faster integer multiplication. SIAM Journal on Computing, 39(3):979-1005, 2009. URL: https://doi.org/10.1137/070711761.
  22. M. R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. Google Scholar
  23. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. URL: https://doi.org/10.1017/cbo9780511574931.
  24. David Harvey and Joris Van Der Hoeven. Integer multiplication in time O(n log n). working paper or preprint, March 2019. URL: https://hal.archives-ouvertes.fr/hal-02070778.
  25. Daniel S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341-343, 1975. URL: https://doi.org/10.1145/360825.360861.
  26. Costas S. Iliopoulos, Christos Makris, Yannis Panagis, Katerina Perdikuri, Evangelos Theodoridis, and Athanasios K. Tsakalidis. Efficient algorithms for handling molecular weighted sequences. In Exploring New Frontiers of Theoretical Informatics, IFIP 18th World Computer Congress, TC1 3rd International Conference on Theoretical Computer Science (TCS2004), 22-27 August 2004, Toulouse, France, pages 265-278, 2004. URL: https://doi.org/10.1007/1-4020-8141-3_22.
  27. Costas S. Iliopoulos, Mirka Miller, and Solon P. Pissis. Parallel algorithms for mapping short degenerate and weighted DNA sequences to a reference genome. International Journal of Foundations of Computer Science, 23(2):249-259, 2012. URL: https://doi.org/10.1142/S0129054112400114.
  28. Costas S. Iliopoulos, Katerina Perdikuri, Evangelos Theodoridis, Athanasios K. Tsakalidis, and Kostas Tsichlas. Motif extraction from weighted sequences. In String Processing and Information Retrieval, 11th International Conference, SPIRE 2004, Padova, Italy, October 5-8, 2004, Proceedings, pages 286-297, 2004. URL: https://doi.org/10.1007/978-3-540-30213-1_41.
  29. Evangelos Kipouridis and Kostas Tsichlas. Longest common subsequence on weighted sequences. CoRR, abs/1901.04068, 2019. URL: http://arxiv.org/abs/1901.04068.
  30. Tomasz Kociumaka, Solon P. Pissis, and Jakub Radoszewski. Pattern matching and consensus problems on weighted sequences and profiles. Theory of Computing Systems, 63(3):506-542, 2019. URL: https://doi.org/10.1007/s00224-018-9881-2.
  31. Dániel Marx. Parameterized complexity and approximation algorithms. The Computer Journal, 51(1):60-78, 2008. URL: https://doi.org/10.1093/comjnl/bxm048.
  32. Gene Myers. The whole genome assembly of drosophila. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, January 9-11, 2000, San Francisco, CA, USA, page 753, 2000. URL: http://dl.acm.org/citation.cfm?id=338219.338635.
  33. Mihai Patrascu and Ryan Williams. On the possibility of faster SAT algorithms. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 1065-1075, 2010. URL: https://doi.org/10.1137/1.9781611973075.86.
  34. Jakub Radoszewski and Tatiana Starikovskaya. Streaming k-mismatch with error correcting and applications. Information and Computation, 271:104513, 2020. URL: https://doi.org/10.1016/j.ic.2019.104513.
  35. Arnold Schönhage and Volker Strassen. Schnelle multiplikation großer zahlen. Computing, 7(3-4):281-292, 1971. URL: https://doi.org/10.1007/BF02242355.
  36. Julie Thompson, Desmond G. Higgins, and Toby J. Gibson. W: Clustal. improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic acids research, 22:4673-80, December 1994. URL: https://doi.org/10.1093/nar/22.22.4673.
  37. J. Craig Venter. Sequencing the human genome. In RECOMB, pages 309-309. ACM, 2002. Google Scholar
  38. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM, 21(1):168-173, 1974. Google Scholar
  39. Hui Zhang, Qing Guo, Jing Fan, and Costas S. Iliopoulos. Loose and strict repeats in weighted sequences of proteins. Protein and Peptide Letters, 17(9):1136-1142, 2010. Google Scholar
  40. Hui Zhang, Qing Guo, and Costas S. Iliopoulos. String matching with swaps in a weighted sequence. In CIS, volume 3314 of Lecture Notes in Computer Science, pages 698-704. Springer, 2004. Google Scholar
  41. Hui Zhang, Qing Guo, and Costas S. Iliopoulos. An algorithmic framework for motif discovery problems in weighted sequences. In Algorithms and Complexity, 7th International Conference, CIAC 2010, Rome, Italy, May 26-28, 2010. Proceedings, pages 335-346, 2010. URL: https://doi.org/10.1007/978-3-642-13073-1_30.
  42. Hui Zhang, Qing Guo, and Costas S. Iliopoulos. Varieties of regularities in weighted sequences. In AAIM, volume 6124 of Lecture Notes in Computer Science, pages 271-280. Springer, 2010. Google Scholar
  43. Hui Zhang, Qing Guo, and Costas S. Iliopoulos. Locating tandem repeats in weighted sequences in proteins. BMC Bioinformatics, 14(S-8):S2, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail