A Compact DAG for Storing and Searching Maximal Common Subsequences

Conte, Alessio; Grossi, Roberto; Punzi, Giulia; Uno, Takeaki

doi:10.4230/LIPIcs.ISAAC.2023.21

File

LIPIcs.ISAAC.2023.21.pdf

Filesize: 0.83 MB
15 pages

Document Identifiers

DOI: 10.4230/LIPIcs.ISAAC.2023.21
URN: urn:nbn:de:0030-drops-193231

Author Details

Alessio Conte

Università di Pisa, Italy

Roberto Grossi

Università di Pisa, Italy

Giulia Punzi

National Institute of Informatics, Tokyo, Japan

Takeaki Uno

National Institute of Informatics, Tokyo, Japan

Acknowledgements

We thank the anonymous Referees for their comments, leading us to the current version of Theorem 13.

Cite AsGet BibTex

Alessio Conte, Roberto Grossi, Giulia Punzi, and Takeaki Uno. A Compact DAG for Storing and Searching Maximal Common Subsequences. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ISAAC.2023.21

Abstract

Maximal Common Subsequences (MCSs) between two strings X and Y are subsequences of both X and Y that are maximal under inclusion. MCSs relax and generalize the well known and widely used concept of Longest Common Subsequences (LCSs), which can be seen as MCSs of maximum length. While the number both LCSs and MCSs can be exponential in the length of the strings, LCSs have been long exploited for string and text analysis, as simple compact representations of all LCSs between two strings, built via dynamic programming or automata, have been known since the '70s. MCSs appear to have a more challenging structure: even listing them efficiently was an open problem open until recently, thus narrowing the complexity difference between the two problems, but the gap remained significant. In this paper we close the complexity gap: we show how to build DAG of polynomial size - in polynomial time - which allows for efficient operations on the set of all MCSs such as enumeration in Constant Amortized Time per solution (CAT), counting, and random access to the i-th element (i.e., rank and select operations). Other than improving known algorithmic results, this work paves the way for new sequence analysis methods based on MCSs.

Subject Classification

ACM Subject Classification

Mathematics of computing → Combinatorial algorithms
Information systems → Structured text search

Keywords

Maximal common subsequence
DAG
Compact data structures
Enumeration
Constant amortized time
Random access

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

A. Abboud, A. Backurs, and V. V. Williams. Tight hardness results for LCS and other sequence similarity measures. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 59-78, October 2015. URL: https://doi.org/10.1109/FOCS.2015.14.
Amihood Amir, Gianni Franceschini, Roberto Grossi, Tsvi Kopelowitz, Moshe Lewenstein, and Noa Lewenstein. Managing unbounded-length keys in comparison-driven data structures with applications to online indexing. SIAM J. Comput., 43(4):1396-1416, 2014. URL: https://doi.org/10.1137/110836377.
Ricardo A Baeza-Yates. Searching subsequences. Theoretical Computer Science, 78(2):363-376, 1991.
L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000, pages 39-48, September 2000. URL: https://doi.org/10.1109/SPIRE.2000.878178.
Sankardeep Chakraborty, Roberto Grossi, Kunihiko Sadakane, and Srinivasa Rao Satti. Succinct representation for (non)deterministic finite automata. J. Comput. Syst. Sci., 131:1-12, 2023. URL: https://doi.org/10.1016/j.jcss.2022.07.002.
Alessio Conte, Roberto Grossi, Giulia Punzi, and Takeaki Uno. Enumeration of maximal common subsequences between two strings. Algorithmica, pages 1-27, 2022.
Maxime Crochemore, Bořivoj Melichar, and Zdeněk Troníček. Directed acyclic subsequence graph - Overview. Journal of Discrete Algorithms, 1(3-4):255-280, 2003.
Maxime Crochemore and Zdeněk Troníček. Directed acyclic subsequence graph for multiple texts. Rapport IGM, pages 99-13, 1999.
C. B. Fraser, R. W. Irving, and M. Middendorf. Maximal common subsequences and minimal common supersequences. Information and Computation, 124(2):145-153, 1996. URL: https://doi.org/10.1006/inco.1996.0011.
Miyuji Hirota and Yoshifumi Sakai. Efficient algorithms for enumerating maximal common subsequences of two strings. CoRR, abs/2307.10552, 2023. URL: https://doi.org/10.48550/arXiv.2307.10552.
Miyuji Hirota and Yoshifumi Sakai. A fast algorithm for finding a maximal common subsequence of multiple strings. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, page 2022DML0002, 2023.
D. S. Hirschberg. Algorithms for the longest common subsequence problem. J. ACM, 24(4):664-675, October 1977. URL: https://doi.org/10.1145/322033.322044.
W. J. Hsu and M. W. Du. Computing a longest common subsequence for a set of strings. BIT Numerical Mathematics, 24(1):45-59, 1984.
Elsa Loekito, James Bailey, and Jian Pei. A binary decision diagram based approach for mining frequent subsequences. Knowl. Inf. Syst., 24(2):235-268, 2010. URL: https://doi.org/10.1007/s10115-009-0252-9.
David Maier. The complexity of some problems on subsequences and supersequences. Journal of the ACM (JACM), 25(2):322-336, 1978.
W. J. Masek and M. S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20(1):18-31, 1980. URL: https://doi.org/10.1016/0022-0000(80)90002-1.
Bořivoj Melichar and Tomáš Polcar. The longest common subsequence problem a finite automata approach. In Implementation and Application of Automata: 8th International Conference, CIAA 2003 Santa Barbara, CA, USA, July 16-18, 2003 Proceedings, pages 294-296. Springer, 2003.
Shin-ichi Minato. Zero-suppressed bdds for set manipulation in combinatorial problems. In Proceedings of the 30th International Design Automation Conference, DAC '93, pages 272-277, New York, NY, USA, 1993. Association for Computing Machinery. URL: https://doi.org/10.1145/157485.164890.
R. Raman, V. Raman, and S. R. Satti. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, 3(4):43-es, November 2007. URL: https://doi.org/10.1145/1290672.1290680.
Frank Ruskey. Combinatorial generation. Preliminary working draft. University of Victoria, Victoria, BC, Canada, 11:20, 2003.
Yoshifumi Sakai. Maximal common subsequence algorithms. In Gonzalo Navarro, David Sankoff, and Binhai Zhu, editors, Annual Symposium on Combinatorial Pattern Matching (CPM 2018), volume 105 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1:1-1:10, Dagstuhl, Germany, 2018. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.CPM.2018.1.
Yoshifumi Sakai. Maximal common subsequence algorithms. Theoretical Computer Science, 793:132-139, 2019. URL: https://doi.org/10.1016/j.tcs.2019.06.020.
Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical Computer Science, 363(1):28-42, 2006. Computing and Combinatorics.
Zdeněk Troníček. Common subsequence automaton. In International Conference on Implementation and Application of Automata, pages 270-275. Springer, 2002.
R. A. Wagner and M. J. Fischer. The string-to-string correction problem. J. ACM, 21(1):168-173, January 1974. URL: https://doi.org/10.1145/321796.321811.

A Compact DAG for Storing and Searching Maximal Common Subsequences

Authors Alessio Conte , Roberto Grossi , Giulia Punzi , Takeaki Uno

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

A Compact DAG for Storing and Searching Maximal Common Subsequences

Authors Alessio Conte , Roberto Grossi , Giulia Punzi , Takeaki Uno

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message