Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication

Authors Giulia Bernardini, Paweł Gawrychowski, Nadia Pisanti, Solon P. Pissis, Giovanna Rosone



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2019.21.pdf
  • Filesize: 0.61 MB
  • 15 pages

Document Identifiers

Author Details

Giulia Bernardini
  • Department of Informatics, Systems and Communication, University of Milano - Bicocca, Italy
Paweł Gawrychowski
  • Institute of Computer Science, University of Wrocław, Poland
Nadia Pisanti
  • Department of Computer Science, University of Pisa, Italy
  • ERABLE Team, INRIA, France
Solon P. Pissis
  • CWI, Amsterdam, The Netherlands
Giovanna Rosone
  • Department of Computer Science, University of Pisa, Italy

Cite AsGet BibTex

Giulia Bernardini, Paweł Gawrychowski, Nadia Pisanti, Solon P. Pissis, and Giovanna Rosone. Even Faster Elastic-Degenerate String Matching via Fast Matrix Multiplication. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 21:1-21:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ICALP.2019.21

Abstract

An elastic-degenerate (ED) string is a sequence of n sets of strings of total length N, which was recently proposed to model a set of similar sequences. The ED string matching (EDSM) problem is to find all occurrences of a pattern of length m in an ED text. The EDSM problem has recently received some attention in the combinatorial pattern matching community, and an O(nm^{1.5}sqrt{log m} + N)-time algorithm is known [Aoyama et al., CPM 2018]. The standard assumption in the prior work on this question is that N is substantially larger than both n and m, and thus we would like to have a linear dependency on the former. Under this assumption, the natural open problem is whether we can decrease the 1.5 exponent in the time complexity, similarly as in the related (but, to the best of our knowledge, not equivalent) word break problem [Backurs and Indyk, FOCS 2016]. Our starting point is a conditional lower bound for the EDSM problem. We use the popular combinatorial Boolean matrix multiplication (BMM) conjecture stating that there is no truly subcubic combinatorial algorithm for BMM [Abboud and Williams, FOCS 2014]. By designing an appropriate reduction we show that a combinatorial algorithm solving the EDSM problem in O(nm^{1.5-epsilon} + N) time, for any epsilon>0, refutes this conjecture. Of course, the notion of combinatorial algorithms is not clearly defined, so our reduction should be understood as an indication that decreasing the exponent requires fast matrix multiplication. Two standard tools used in algorithms on strings are string periodicity and fast Fourier transform. Our main technical contribution is that we successfully combine these tools with fast matrix multiplication to design a non-combinatorial O(nm^{1.381} + N)-time algorithm for EDSM. To the best of our knowledge, we are the first to do so.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • string algorithms
  • pattern matching
  • elastic-degenerate string
  • matrix multiplication
  • fast Fourier transform

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. Abboud, A. Backurs, and V.V. Williams. If the Current Clique Algorithms are Optimal, So is Valiant’s Parser. In 56th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 98-117, 2015. Google Scholar
  2. A. Abboud and V.V. Williams. Popular Conjectures Imply Strong Lower Bounds for Dynamic Problems. In 55th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 434-443, 2014. Google Scholar
  3. K. Abrahamson. Generalized String Matching. SIAM J. Comput., 16(6):1039-1051, 1987. Google Scholar
  4. M. Alzamel, L.A.K. Ayad, G. Bernardini, R. Grossi, C.S. Iliopoulos, N. Pisanti, S.P. Pissis, and G. Rosone. Degenerate string comparison and applications. In 18th International Workshop on Algorithms in Bioinformatics (WABI), volume 113 of LIPIcs, pages 21:1-21:14, 2018. Google Scholar
  5. A. Amir, M. Lewenstein, and E. Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. Google Scholar
  6. K. Aoyama, Y. Nakashima, T. I, S. Inenaga, H. Bannai, and M. Takeda. Faster Online Elastic Degenerate String Matching. In 29th Symposium on Combinatorial Pattern Matching (CPM), volume 105 of LIPIcs, pages 9:1-9:10, 2018. Google Scholar
  7. V.L. Arlazarov, E.A. Dinic, M.A. Kronrod, and I.A. Faradžev. On economical construction of the transitive closure of a directed graph. Soviet Mathematics Doklady, 11(5):1209-1210, 1970. Google Scholar
  8. A. Backurs and P. Indyk. Which Regular Expression Patterns Are Hard to Match? In 57th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 457-466, 2016. Google Scholar
  9. N. Bansal and R. Williams. Regularity Lemmas and Combinatorial Algorithms. In 50th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 745-754, 2009. Google Scholar
  10. M.A. Bender and M. Farach-Colton. The LCA Problem Revisited. In 4th Latin American symposium on Theoretical INformatics (LATIN), volume 1776 of Springer LNCS, pages 88-94, 2000. Google Scholar
  11. J.L. Bentley. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM, 18(9):509-517, 1975. Google Scholar
  12. G. Bernardini, N. Pisanti, S.P. Pissis, and G. Rosone. Pattern Matching on Elastic-Degenerate Text with Errors. In 24th International Symposium on String Processing and Information Retrieval (SPIRE), pages 74-90, 2017. Google Scholar
  13. K. Bringmann, F. Grandoni, B. Saha, and V.V. Williams. Truly Sub-cubic Algorithms for Language Edit Distance and RNA-Folding via Fast Bounded-Difference Min-Plus Product. In 56th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 375-384, 2016. Google Scholar
  14. K. Bringmann, A. Grønlund, and K.G. Larsen. A Dichotomy for Regular Expression Membership Testing. In 58th IEEE Symposium on Foundations Of Computer Science (FOCS), pages 307-318, 2017. Google Scholar
  15. T.M. Chan. Speeding Up the Four Russians Algorithm by About One More Logarithmic Factor. In 26th ACM-SIAM Symposium On Discrete Algorithms (SODA), pages 212-217, 2015. Google Scholar
  16. Y.-J. Chang. Hardness of RNA Folding Problem With Four Symbols. In 27th Symposium on Combinatorial Pattern Matching (CPM), volume 54 of LIPIcs, pages 13:1-13:12, 2016. Google Scholar
  17. K. Chatterjee, B. Choudhary, and A. Pavlogiannis. Optimal Dyck Reachability for Data-dependence and Alias Analysis. In 45th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), pages 30:1-30:30, 2018. Google Scholar
  18. A. Cisłak, S. Grabowski, and J. Holub. SOPanG: online text searching over a pan-genome. Bioinformatics, page bty506, 2018. Google Scholar
  19. P. Clifford and R. Clifford. Simple deterministic wildcard matching. Inf. Process. Lett., 101(2):53-54, 2007. Google Scholar
  20. R. Cole and R. Hariharan. Verifying candidate matches in sparse and wildcard matching. In 34th ACM Symposium on Theory Of Computing (STOC), pages 592-601, 2002. Google Scholar
  21. The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, 19(1):118-135, 2018. Google Scholar
  22. M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on strings. Cambridge University Press, 2007. Google Scholar
  23. M. Crochemore and D. Perrin. Two-Way String Matching. J. ACM, 38(3):651-675, 1991. Google Scholar
  24. A. Czumaj and A. Lingas. Finding a Heaviest Vertex-Weighted Triangle Is not Harder than Matrix Multiplication. SIAM J. Comput., 39(2):431-444, 2009. Google Scholar
  25. M. Farach and S. Muthukrishnan. Perfect Hashing for Strings: formalization and Algorithms. In 7th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1075 of Springer LNCS, pages 130-140, 1996. Google Scholar
  26. N.J. Fine and H.S. Wilf. Uniqueness Theorems for Periodic Functions. Proceedings of the American Mathematical Society, 16(1):109-114, 1965. Google Scholar
  27. M.J. Fischer and A.R. Meyer. Boolean Matrix Multiplication and Transitive Closure. In 12th IEEE Symposium on Switching and Automata Theory (SWAT/FOCS), pages 129-131, 1971. Google Scholar
  28. M.J. Fischer and M.S. Paterson. String matching and other products. In 7th SIAM-AMS Complexity of Computation, pages 113-125, 1974. Google Scholar
  29. M.E. Furman. Application of a method of fast multiplication of matrices in the problem of finding the transitive closure of a graph. Soviet Mathematics Doklady, 11(5):1252, 1970. Google Scholar
  30. F. Le Gall. Powers of tensors and fast matrix multiplication. In 39th International Symposium on Symbolic and Algebraic Computation (ISSAC), pages 296-303, 2014. Google Scholar
  31. P. Gawrychowski and P. Uznański. Towards Unified Approximate Pattern Matching for Hamming and L₁ Distance. In 45th International Colloquium on Automata, Languages and Programming (ICALP), volume 107 of LIPIcs, pages 62:1-62:13, 2018. Google Scholar
  32. R. Grossi, C.S. Iliopoulos, C. Liu, N. Pisanti, S.P. Pissis, A. Retha, G. Rosone, F. Vayani, and L. Versari. On-Line Pattern Matching on Similar Texts. In 28th Symposium on Combinatorial Pattern Matching (CPM), volume 78 of LIPIcs, pages 9:1-9:14, 2017. Google Scholar
  33. M. Henzinger, S. Krinninger, D. Nanongkai, and T. Saranurak. Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. In 47th ACM Symposium on Theory Of Computing (STOC), pages 21-30, 2015. Google Scholar
  34. J. Holub, W.F. Smyth, and S. Wang. Fast pattern-matching on indeterminate strings. J. Discrete Algorithms, 6(1):37-50, 2008. Google Scholar
  35. C.S. Iliopoulos, R. Kundu, and S.P. Pissis. Efficient Pattern Matching in Elastic-Degenerate Texts. In 11th International Conference on Language and Automata Theory and Applications (LATA), volume 10168 of Springer LNCS, pages 131-142, 2017. Google Scholar
  36. P. Indyk. Faster Algorithms for String Matching Problems: Matching the Convolution Bound. In 39th Symposium on Foundations Of Computer Science (FOCS), pages 166-173, 1998. Google Scholar
  37. A. Itai and M. Rodeh. Finding a Minimum Circuit in a Graph. In 9th ACM Symposium on Theory Of Computing (STOC), pages 1-10, 1977. Google Scholar
  38. IUPAC-IUB Commission on Biochemical Nomenclature. Abbreviations and symbols for nucleic acids, polynucleotides, and their constituents. Biochemistry, 9(20):4022-4027, 1970. Google Scholar
  39. A. Kalai. Efficient pattern-matching with don't cares. In 13th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 655-656, 2002. Google Scholar
  40. D.E. Knuth, J.H. Morris Jr., and V.R. Pratt. Fast Pattern Matching in Strings. SIAM J. Comput., 6(2):323-350, 1977. Google Scholar
  41. T. Kociumaka, J. Radoszewski, W. Rytter, and T. Waleń. Internal Pattern Matching Queries in a Text and Applications. In 26th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 532-551, 2015. Google Scholar
  42. T. Kopelowitz and R. Krauthgamer. Color-distance oracles and snippets. In 27th Symposium on Combinatorial Pattern Matching (CPM), volume 54 of LIPIcs, pages 24:1-24:10, 2016. Google Scholar
  43. K.G. Larsen, I. Munro, J.S. Nielsen, and S.V. Thankachan. On hardness of several string indexing problems. Theor. Comput. Sci., 582:74-82, 2015. Google Scholar
  44. L. Lee. Fast Context-free Grammar Parsing Requires Fast Boolean Matrix Multiplication. J. ACM, 49(1):1-15, 2002. Google Scholar
  45. J. Matoušek. Computing Dominances in Eⁿ. Inf. Process. Lett., 38(5):277-278, 1991. Google Scholar
  46. I. Munro. Efficient Determination of the Transitive Closure of a Directed Graph. Inf. Process. Lett., 1(2):56-58, 1971. Google Scholar
  47. G. Navarro. NR-grep: a fast and flexible pattern-matching tool. Softw., Pract. Exper., 31(13):1265-1312, 2001. Google Scholar
  48. S.P. Pissis and A. Retha. Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line. In 17th International Symposium on Experimental Algorithms (SEA), volume 103 of LIPIcs, pages 16:1-16:14, 2018. Google Scholar
  49. L. Roditty and U. Zwick. On Dynamic Shortest Paths Problems. In 12th European Symposium on Algorithms (ESA), volume 3221 of Springer LNCS, pages 580-591, 2004. Google Scholar
  50. M. Ružić. Constructing Efficient Dictionaries in Close to Sorting Time. In 35th International Colloquium on Automata, Languages and Programming (ICALP), volume 5125 of Springer LNCS, pages 84-95, 2008. Google Scholar
  51. D.D. Sleator and R.E. Tarjan. A Data Structure for Dynamic Trees. J. Comput. Syst. Sci., 26(3):362-391, 1983. Google Scholar
  52. L.G. Valiant. General Context-free Recognition in Less Than Cubic Time. J. Comput. Syst. Sci., 10(2):308-315, April 1975. Google Scholar
  53. P. Weiner. Linear Pattern Matching Algorithms. In 14th IEEE Annual Symposium on Switching and Automata Theory (SWAT/FOCS), pages 1-11, 1973. Google Scholar
  54. V.V. Williams. Multiplying matrices faster than Coppersmith-Winograd. In 44th ACM Symposium on Theory Of Computing Conference (STOC), pages 887-898, 2012. Google Scholar
  55. V.V. Williams and R. Williams. Finding a maximum weight triangle in n^3-δ time, with applications. In 38th ACM Symposium on Theory Of Computing Conference (STOC), pages 225-231, 2006. Google Scholar
  56. V.V. Williams and R. Williams. Subcubic equivalences between path, matrix and triangle problems. In 51st IEEE Symposium on Foundations Of Computer Science (FOCS), pages 645-654, 2010. Google Scholar
  57. S. Wu and U. Manber. Agrep - A Fast Approximate Pattern-Matching Tool. In USENIX Technical Conference, pages 153-162, 1992. Google Scholar
  58. H. Yu. An improved combinatorial algorithm for Boolean matrix multiplication. Inf. Comput., 261(Part):240-247, 2018. Google Scholar
  59. U. Zwick. All pairs shortest paths using bridging sets and rectangular matrix multiplication. J. ACM, 49(3):289-317, 2002. Google Scholar