Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication
An elastic-degenerate (ED) string is a sequence of n sets of strings of total length N, which was recently proposed to model a set of similar sequences. The ED string matching (EDSM) problem is to find all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention in the combinatorial pattern matching community, and an O(nm^{1.5}sqrt{log m} + N)-time algorithm is known [Aoyama et al., CPM 2018]. The standard assumption in the prior work on this question is that N is substantially larger than both n and m, and thus we would like to have a linear dependency on the former. Under this assumption, the natural open problem is whether we can decrease the 1.5 exponent in the time complexity, similarly as in the related (but, to the best of our knowledge, not equivalent) word break problem [Backurs and Indyk, FOCS 2016].
Our starting point is a conditional lower bound for the EDSM problem. We use the popular combinatorial Boolean matrix multiplication (BMM) conjecture stating that there is no truly subcubic combinatorial algorithm for BMM [Abboud and Williams, FOCS 2014]. By designing an appropriate reduction we show that a combinatorial algorithm solving the EDSM problem in O(nm^{1.5-epsilon} + N) time, for any epsilon>0, refutes this conjecture. Of course, the notion of combinatorial algorithms is not clearly defined, so our reduction should be understood as an indication that decreasing the exponent requires fast matrix multiplication.
Two standard tools used in algorithms on strings are string periodicity and fast Fourier transform. Our main technical contribution is that we successfully combine these tools with fast matrix multiplication to design a non-combinatorial O(nm^{1.381} + N)-time algorithm for EDSM. To the best of our knowledge, we are the first to do so.
string algorithms
pattern matching
elastic-degenerate string
matrix multiplication
fast Fourier transform
Theory of computation~Pattern matching
21:1-21:15
Track A: Algorithms, Complexity and Games
GR and NP are partially supported by MIUR-SIR project CMACBioSeq "Combinatorial methods for analysis and compression of biological sequences" grant n. RBSI146R5L.
A full version of the paper is available at https://arxiv.org/abs/1905.02298.
Giulia
Bernardini
Giulia Bernardini
Department of Informatics, Systems and Communication, University of Milano - Bicocca, Italy
Paweł
Gawrychowski
Paweł Gawrychowski
Institute of Computer Science, University of Wrocław, Poland
Nadia
Pisanti
Nadia Pisanti
Department of Computer Science, University of Pisa, Italy
ERABLE Team, INRIA, France
Solon P.
Pissis
Solon P. Pissis
CWI, Amsterdam, The Netherlands
Giovanna
Rosone
Giovanna Rosone
Department of Computer Science, University of Pisa, Italy
10.4230/LIPIcs.ICALP.2019.21
A. Abboud, A. Backurs, and V.V. Williams. If the Current Clique Algorithms are Optimal, So is Valiant’s Parser. In 56th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 98-117, 2015.
A. Abboud and V.V. Williams. Popular Conjectures Imply Strong Lower Bounds for Dynamic Problems. In 55th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 434-443, 2014.
K. Abrahamson. Generalized String Matching. SIAM J. Comput., 16(6):1039-1051, 1987.
M. Alzamel, L.A.K. Ayad, G. Bernardini, R. Grossi, C.S. Iliopoulos, N. Pisanti, S.P. Pissis, and G. Rosone. Degenerate string comparison and applications. In 18th International Workshop on Algorithms in Bioinformatics (WABI), volume 113 of LIPIcs, pages 21:1-21:14, 2018.
A. Amir, M. Lewenstein, and E. Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004.
K. Aoyama, Y. Nakashima, T. I, S. Inenaga, H. Bannai, and M. Takeda. Faster Online Elastic Degenerate String Matching. In 29th Symposium on Combinatorial Pattern Matching (CPM), volume 105 of LIPIcs, pages 9:1-9:10, 2018.
V.L. Arlazarov, E.A. Dinic, M.A. Kronrod, and I.A. Faradžev. On economical construction of the transitive closure of a directed graph. Soviet Mathematics Doklady, 11(5):1209-1210, 1970.
A. Backurs and P. Indyk. Which Regular Expression Patterns Are Hard to Match? In 57th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 457-466, 2016.
N. Bansal and R. Williams. Regularity Lemmas and Combinatorial Algorithms. In 50th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 745-754, 2009.
M.A. Bender and M. Farach-Colton. The LCA Problem Revisited. In 4th Latin American symposium on Theoretical INformatics (LATIN), volume 1776 of Springer LNCS, pages 88-94, 2000.
J.L. Bentley. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM, 18(9):509-517, 1975.
G. Bernardini, N. Pisanti, S.P. Pissis, and G. Rosone. Pattern Matching on Elastic-Degenerate Text with Errors. In 24th International Symposium on String Processing and Information Retrieval (SPIRE), pages 74-90, 2017.
K. Bringmann, F. Grandoni, B. Saha, and V.V. Williams. Truly Sub-cubic Algorithms for Language Edit Distance and RNA-Folding via Fast Bounded-Difference Min-Plus Product. In 56th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 375-384, 2016.
K. Bringmann, A. Grønlund, and K.G. Larsen. A Dichotomy for Regular Expression Membership Testing. In 58th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 307-318, 2017.
T.M. Chan. Speeding Up the Four Russians Algorithm by About One More Logarithmic Factor. In 26th ACM-SIAM Symposium On Discrete Algorithms (SODA), pages 212-217, 2015.
Y.-J. Chang. Hardness of RNA Folding Problem With Four Symbols. In 27th Symposium on Combinatorial Pattern Matching (CPM), volume 54 of LIPIcs, pages 13:1-13:12, 2016.
K. Chatterjee, B. Choudhary, and A. Pavlogiannis. Optimal Dyck Reachability for Data-dependence and Alias Analysis. In 45th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), pages 30:1-30:30, 2018.
A. Cisłak, S. Grabowski, and J. Holub. SOPanG: online text searching over a pan-genome. Bioinformatics, page bty506, 2018.
P. Clifford and R. Clifford. Simple deterministic wildcard matching. Inf. Process. Lett., 101(2):53-54, 2007.
R. Cole and R. Hariharan. Verifying candidate matches in sparse and wildcard matching. In 34th ACM Symposium on Theory Of Computing (STOC), pages 592-601, 2002.
The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, 19(1):118-135, 2018.
M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on strings. Cambridge University Press, 2007.
M. Crochemore and D. Perrin. Two-Way String Matching. J. ACM, 38(3):651-675, 1991.
A. Czumaj and A. Lingas. Finding a Heaviest Vertex-Weighted Triangle Is not Harder than Matrix Multiplication. SIAM J. Comput., 39(2):431-444, 2009.
M. Farach and S. Muthukrishnan. Perfect Hashing for Strings: formalization and Algorithms. In 7th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1075 of Springer LNCS, pages 130-140, 1996.
N.J. Fine and H.S. Wilf. Uniqueness Theorems for Periodic Functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965.
M.J. Fischer and A.R. Meyer. Boolean Matrix Multiplication and Transitive Closure. In 12th IEEE Symposium on Switching and Automata Theory (SWAT/FOCS), pages 129-131, 1971.
M.J. Fischer and M.S. Paterson. String matching and other products. In 7th SIAM-AMS Complexity of Computation, pages 113-125, 1974.
M.E. Furman. Application of a method of fast multiplication of matrices in the problem of finding the transitive closure of a graph. Soviet Mathematics Doklady, 11(5):1252, 1970.
F. Le Gall. Powers of tensors and fast matrix multiplication. In 39th International Symposium on Symbolic and Algebraic Computation (ISSAC), pages 296-303, 2014.
P. Gawrychowski and P. Uznański. Towards Unified Approximate Pattern Matching for Hamming and L₁ Distance. In 45th International Colloquium on Automata, Languages and Programming (ICALP), volume 107 of LIPIcs, pages 62:1-62:13, 2018.
R. Grossi, C.S. Iliopoulos, C. Liu, N. Pisanti, S.P. Pissis, A. Retha, G. Rosone, F. Vayani, and L. Versari. On-Line Pattern Matching on Similar Texts. In 28th Symposium on Combinatorial Pattern Matching (CPM), volume 78 of LIPIcs, pages 9:1-9:14, 2017.
M. Henzinger, S. Krinninger, D. Nanongkai, and T. Saranurak. Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. In 47th ACM Symposium on Theory Of Computing (STOC), pages 21-30, 2015.
J. Holub, W.F. Smyth, and S. Wang. Fast pattern-matching on indeterminate strings. J. Discrete Algorithms, 6(1):37-50, 2008.
C.S. Iliopoulos, R. Kundu, and S.P. Pissis. Efficient Pattern Matching in Elastic-Degenerate Texts. In 11th International Conference on Language and Automata Theory and Applications (LATA), volume 10168 of Springer LNCS, pages 131-142, 2017.
P. Indyk. Faster Algorithms for String Matching Problems: Matching the Convolution Bound. In 39th Symposium on Foundations Of Computer Science (FOCS), pages 166-173, 1998.
A. Itai and M. Rodeh. Finding a Minimum Circuit in a Graph. In 9th ACM Symposium on Theory Of Computing (STOC), pages 1-10, 1977.
IUPAC-IUB Commission on Biochemical Nomenclature. Abbreviations and symbols for nucleic acids, polynucleotides, and their constituents. Biochemistry, 9(20):4022-4027, 1970.
A. Kalai. Efficient pattern-matching with don't cares. In 13th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 655-656, 2002.
D.E. Knuth, J.H. Morris Jr., and V.R. Pratt. Fast Pattern Matching in Strings. SIAM J. Comput., 6(2):323-350, 1977.
T. Kociumaka, J. Radoszewski, W. Rytter, and T. Waleń. Internal Pattern Matching Queries in a Text and Applications. In 26th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 532-551, 2015.
T. Kopelowitz and R. Krauthgamer. Color-distance oracles and snippets. In 27th Symposium on Combinatorial Pattern Matching (CPM), volume 54 of LIPIcs, pages 24:1-24:10, 2016.
K.G. Larsen, I. Munro, J.S. Nielsen, and S.V. Thankachan. On hardness of several string indexing problems. Theor. Comput. Sci., 582:74-82, 2015.
L. Lee. Fast Context-free Grammar Parsing Requires Fast Boolean Matrix Multiplication. J. ACM, 49(1):1-15, 2002.
J. Matoušek. Computing Dominances in Eⁿ. Inf. Process. Lett., 38(5):277-278, 1991.
I. Munro. Efficient Determination of the Transitive Closure of a Directed Graph. Inf. Process. Lett., 1(2):56-58, 1971.
G. Navarro. NR-grep: a fast and flexible pattern-matching tool. Softw., Pract. Exper., 31(13):1265-1312, 2001.
S.P. Pissis and A. Retha. Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line. In 17th International Symposium on Experimental Algorithms (SEA), volume 103 of LIPIcs, pages 16:1-16:14, 2018.
L. Roditty and U. Zwick. On Dynamic Shortest Paths Problems. In 12th European Symposium on Algorithms (ESA), volume 3221 of Springer LNCS, pages 580-591, 2004.
M. Ružić. Constructing Efficient Dictionaries in Close to Sorting Time. In 35th International Colloquium on Automata, Languages and Programming (ICALP), volume 5125 of Springer LNCS, pages 84-95, 2008.
D.D. Sleator and R.E. Tarjan. A Data Structure for Dynamic Trees. J. Comput. Syst. Sci., 26(3):362-391, 1983.
L.G. Valiant. General Context-free Recognition in Less Than Cubic Time. J. Comput. Syst. Sci., 10(2):308-315, April 1975.
P. Weiner. Linear Pattern Matching Algorithms. In 14th IEEE Annual Symposium on Switching and Automata Theory (SWAT/FOCS), pages 1-11, 1973.
V.V. Williams. Multiplying matrices faster than Coppersmith-Winograd. In 44th ACM Symposium on Theory Of Computing Conference (STOC), pages 887-898, 2012.
V.V. Williams and R. Williams. Finding a maximum weight triangle in n^3-δ time, with applications. In 38th ACM Symposium on Theory Of Computing Conference (STOC), pages 225-231, 2006.
V.V. Williams and R. Williams. Subcubic equivalences between path, matrix and triangle problems. In 51st IEEE Symposium on Foundations Of Computer Science (FOCS), pages 645-654, 2010.
S. Wu and U. Manber. Agrep - A Fast Approximate Pattern-Matching Tool. In USENIX Technical Conference, pages 153-162, 1992.
H. Yu. An improved combinatorial algorithm for Boolean matrix multiplication. Inf. Comput., 261(Part):240-247, 2018.
U. Zwick. All pairs shortest paths using bridging sets and rectangular matrix multiplication. J. ACM, 49(3):289-317, 2002.
Giulia Bernardini, Paweł Gawrychowski, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode