Faster STR-IC-LCS Computation via RLE

Authors Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda



PDF
Thumbnail PDF

File

LIPIcs.CPM.2017.20.pdf
  • Filesize: 0.55 MB
  • 12 pages

Document Identifiers

Author Details

Keita Kuboi
Yuta Fujishige
Shunsuke Inenaga
Hideo Bannai
Masayuki Takeda

Cite As Get BibTex

Keita Kuboi, Yuta Fujishige, Shunsuke Inenaga, Hideo Bannai, and Masayuki Takeda. Faster STR-IC-LCS Computation via RLE. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 20:1-20:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.CPM.2017.20

Abstract

The constrained LCS problem asks one to find a longest common subsequence of two input strings A and B with some constraints. The STR-IC-LCS problem is a variant of the constrained LCS problem, where the solution must include a given constraint string C as a substring. Given two strings A and B of respective lengths M and N, and a constraint string C of length at most min{M, N}, the best known algorithm for the STR-IC-LCS problem, proposed by Deorowicz (Inf. Process. Lett., 11:423-426, 2012), runs in O(MN) time. In this work, we present an O(mN + nM)-time solution to the STR-IC-LCS problem, where m and n denote the sizes of the run-length encodings of A and B, respectively. Since m <= M and n <= N always hold, our algorithm is always as fast as Deorowicz's algorithm, and is faster when input strings are compressible via RLE.

Subject Classification

Keywords
  • longest common subsequence
  • STR-IC-LCS
  • run-length encoding

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Shegufta Bakht Ahsan, Syeda Persia Aziz, and M. Sohel Rahman. Longest common subsequence problem for run-length-encoded strings. J. Comput., 9(8):1769-1775, 2014. URL: http://dx.doi.org/10.4304/jcp.9.8.1769-1775.
  2. Hsing-Yen Ann, Chang-Biau Yang, Chiou-Ting Tseng, and Chiou-Yi Hor. Fast algorithms for computing the constrained LCS of run-length encoded strings. Theor. Comput. Sci., 432:1-9, 2012. URL: http://dx.doi.org/10.1016/j.tcs.2012.01.038.
  3. Anthony Bagnall, Chotirat "Ann" Ratanamahatana, Eamonn Keogh, Stefano Lonardi, and Gareth Janacek. A bit level representation for time series data mining with shape based similarity. Data Min. Knowl. Discov., 13(1):11-40, 2006. URL: http://dx.doi.org/10.1007/s10618-005-0028-0.
  4. Horst Bunke and János Csirik. An improved algorithm for computing the edit distance of run-length coded strings. Inf. Process. Lett., 54(2):93-96, 1995. URL: http://dx.doi.org/10.1016/0020-0190(95)00005-W.
  5. Yi-Ching Chen and Kun-Mao Chao. On the generalized constrained longest common subsequence problems. J. Comb. Optim., 21(3):383-392, 2011. URL: http://dx.doi.org/10.1007/s10878-009-9262-5.
  6. Francis Y. L. Chin, Alfredo De Santis, Anna Lisa Ferrara, N. L. Ho, and S. K. Kim. A simple algorithm for the constrained sequence problems. Inf. Process. Lett., 90(4):175-179, 2004. URL: http://dx.doi.org/10.1016/j.ipl.2004.02.008.
  7. Sebastian Deorowicz. Quadratic-time algorithm for a string constrained LCS problem. Inf. Process. Lett., 112(11):423-426, 2012. URL: http://dx.doi.org/10.1016/j.ipl.2012.02.007.
  8. Paul Heckel. A technique for isolating differences between files. Commun. ACM, 21(4):264-268, 1978. URL: http://dx.doi.org/10.1145/359460.359467.
  9. James W. Hunt and Malcolm Douglas McIlroy. An algorithm for differential file comparison. Technical Report 41, Bell Laboratories, 1976. URL: http://www.cs.dartmouth.edu/~doug/diff.pdf.
  10. Dmitry Korkin and Lev Goldfarb. Multiple genome rearrangement: a general approach via the evolutionary genome graph. Bioinformatics, 18(suppl_1):S303-S311, 2002. URL: http://dx.doi.org/10.1093/bioinformatics/18.suppl_1.s303.
  11. Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov., 15(2):107-144, 2007. URL: http://dx.doi.org/10.1007/s10618-007-0064-z.
  12. Jia-Jie Liu, Yue-Li Wang, and Yu-Shan Chiu. Constrained longest common subsequences with run-length-encoded strings. Comput. J., 58(5):1074-1084, 2015. URL: http://dx.doi.org/10.1093/comjnl/bxu012.
  13. Helman Stern, Merav Shmueli, and Sigal Berman. Most discriminating segment - longest common subsequence (MDSLCS) algorithm for dynamic hand gesture classification. Pattern Recognit. Lett., 34(15):1980-1989, 2013. URL: http://dx.doi.org/10.1016/j.patrec.2013.02.007.
  14. Yin-Te Tsai. The constrained longest common subsequence problem. Inf. Process. Lett., 88(4):173-176, 2003. URL: http://dx.doi.org/10.1016/j.ipl.2003.07.001.
  15. Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, 1974. URL: http://dx.doi.org/10.1145/321796.321811.
  16. Congmao Wang and Dabing Zhang. A novel compression tool for efficient storage of genome resequencing data. Nucleic Acids Res., 39(7):e45, 2011. URL: http://dx.doi.org/10.1093/nar/gkr009.
  17. Lei Wang, Xiaodong Wang, Yingjie Wu, and Daxin Zhu. A dynamic programming solution to a generalized LCS problem. Inf. Process. Lett., 113(19-21):723-728, 2013. URL: http://dx.doi.org/10.1016/j.ipl.2013.07.005.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail