Generalised Pattern Matching Revisited

Authors Bartłomiej Dudek, Paweł Gawrychowski, Tatiana Starikovskaya



PDF
Thumbnail PDF

File

LIPIcs.STACS.2020.18.pdf
  • Filesize: 0.64 MB
  • 18 pages

Document Identifiers

Author Details

Bartłomiej Dudek
  • Institute of Computer Science, University of Wrocław, Poland
Paweł Gawrychowski
  • Institute of Computer Science, University of Wrocław, Poland
Tatiana Starikovskaya
  • DIENS, École normale supérieure, PSL Research University, Paris, France

Cite AsGet BibTex

Bartłomiej Dudek, Paweł Gawrychowski, and Tatiana Starikovskaya. Generalised Pattern Matching Revisited. In 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 154, pp. 18:1-18:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.STACS.2020.18

Abstract

In the problem of Generalised Pattern Matching (GPM) [STOC'94, Muthukrishnan and Palem], we are given a text T of length n over an alphabet Σ_T, a pattern P of length m over an alphabet Σ_P, and a matching relationship ⊆ Σ_T × Σ_P, and must return all substrings of T that match P (reporting) or the number of mismatches between each substring of T of length m and P (counting). In this work, we improve over all previously known algorithms for this problem: - For ? being the maximum number of characters that match a fixed character, we show two new Monte Carlo algorithms, a reporting algorithm with time ?(? n log n log m) and a (1-ε)-approximation counting algorithm with time ?(ε^-1 ? n log n log m). We then derive a (1-ε)-approximation deterministic counting algorithm for GPM with ?(ε^-2 ? n log⁶ n) time. - For ? being the number of pairs of matching characters, we demonstrate Monte Carlo algorithms for reporting and (1-ε)-approximate counting with running time ?(√? n log m √{log n}) and ?(√{ε^-1 ?} n log m √{log n}), respectively, as well as a (1-ε)-approximation deterministic algorithm for the counting variant of GPM with ?(ε^-1 √{?} n log^{7/2} n) time. - Finally, for ℐ being the total number of disjoint intervals of characters that match the m characters of the pattern P, we show that both the reporting and the counting variants of GPM can be solved exactly and deterministically in ?(n√{ℐ log m} +n log n) time. At the heart of our new deterministic upper bounds for ? and ? lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for GPM. We start by showing that any deterministic or Monte Carlo algorithm for GPM must use Ω(?) time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • pattern matching
  • superimposed codes
  • conditional lower bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Loukas Georgiadis, Giuseppe F. Italiano, Robert Krauthgamer, Nikos Parotsidis, Ohad Trabelsi, Przemyslaw Uznanski, and Daniel Wolleb-Graf. Faster algorithms for all-pairs bounded min-cuts. In Proceedings of the International Colloquium on Automata, Languages, and Programming, ICALP, pages 7:1-7:15, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.7.
  2. Amir Abboud and Virginia Vassilevska Williams. Popular conjectures imply strong lower bounds for dynamic problems. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, FOCS, pages 434-443. IEEE Computer Society, 2014. URL: https://doi.org/10.1109/FOCS.2014.53.
  3. Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039-1051, 1987. URL: https://doi.org/10.1137/0216067.
  4. Mikhail J. Atallah and Timothy W. Duket. Pattern matching in the Hamming distance with thresholds. Information Processing Letters, 111(14):674-677, 2011. URL: https://doi.org/10.1016/j.ipl.2011.04.004.
  5. Nikhil Bansal. Constructive algorithms for discrepancy minimization. In Proceedings of the Annual IEEE Symposium on Foundations of Computer Science, FOCS, pages 3-10, 2010. URL: https://doi.org/10.1109/FOCS.2010.7.
  6. Nikhil Bansal, Moses Charikar, Ravishankar Krishnaswamy, and Shi Li. Better algorithms and hardness for broadcast scheduling via a discrepancy approach. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 55-71, 2014. URL: https://doi.org/10.1137/1.9781611973402.5.
  7. Nikhil Bansal, Daniel Dadush, and Shashwat Garg. An algorithm for Komlós conjecture matching Banaszczyk’s bound. SIAM J. Comput., 48(2):534-553, 2019. URL: https://doi.org/10.1137/17M1126795.
  8. Nikhil Bansal, Daniel Dadush, Shashwat Garg, and Shachar Lovett. The Gram-Schmidt walk: A cure for the Banaszczyk blues. In Proceedings of the Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages 587-597, 2018. URL: https://doi.org/10.1145/3188745.3188850.
  9. Nikhil Bansal and Shashwat Garg. Algorithmic discrepancy beyond partial coloring. In Proceedings of the Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages 914-926, 2017. URL: https://doi.org/10.1145/3055399.3055490.
  10. Nikhil Bansal and Joel Spencer. Deterministic discrepancy minimization. Algorithmica, 67(4):451-471, 2013. URL: https://doi.org/10.1007/s00453-012-9728-1.
  11. Emilios Cambouropoulos, Maxime Crochemore, Costas S. Iliopoulos, Laurent Mouchard, and Yoan J. Pinzon. Algorithms for computing approximate repetitions in musical sequences. International Journal of Computer Mathematics, 79(11):1135-1146, 2002. URL: https://doi.org/10.1080/00207160213939.
  12. Domenico Cantone, Salvatore Cristofaro, and Simone Faro. An efficient algorithm for δ-approximate matching with α-bounded gaps in musical sequences. In Proceedings of the International Conference on Experimental and Efficient Algorithms, WEA, pages 428-439, 2005. URL: https://doi.org/10.1007/11427186_37.
  13. Bernard Chazelle. The discrepancy method - randomness and complexity. Cambridge University Press, 2001. URL: https://doi.org/10.1017/CBO9780511626371.
  14. Sunil Chebolu and Jan Minac. Counting irreducible polynomials over finite fields using the inclusion-exclusion principle. Mathematics Magazine, 84(5):369-371, 2011. URL: https://doi.org/10.4169/math.mag.84.5.369.
  15. Peter Clifford and Raphaël Clifford. Simple deterministic wildcard matching. Information Processing Letters, 101(2):53-54, 2007. URL: https://doi.org/10.1016/j.ipl.2006.08.002.
  16. Peter Clifford, Raphaël Clifford, and Costas Iliopoulos. Faster algorithms for δ,γ-matching and related problems. In Proceedings on the Annual Symposium on Combinatorial Pattern Matching, CPM, pages 68-78, 2005. URL: https://doi.org/10.1007/11496656_7.
  17. Raphaël Clifford and Ely Porat. A filtering algorithm for k-mismatch with don't cares. Information Processing Letters, 110(22):1021-1025, 2010. URL: https://doi.org/10.1016/j.ipl.2010.08.012.
  18. Richard Cole and Ramesh Hariharan. Verifying candidate matches in sparse and wildcard matching. In Proceedings of the Annual ACM Symposium on Theory of Computing, STOC, pages 592-601, 2002. URL: https://doi.org/10.1145/509907.509992.
  19. Richard Cole, Costas Iliopoulos, Thierry Lecroq, Wojciech Plandowski, and Wojciech Rytter. On special families of morphisms related to δ-matching and don't care symbols. Information Processing Letters, 85(5):227-233, 2003. URL: https://doi.org/10.1016/S0020-0190(02)00430-1.
  20. Maxime Crochemore, Costas S. Iliopoulos, Thierry Lecroq, Yoan J. Pinzon, Wojciech Plandowski, and Wojciech Rytter. Occurrence and substring heuristics for δ-matching. Fundamenta Informaticae, 56(1,2):1-21, October 2002. Google Scholar
  21. Michael John Fischer and Michael Stewart Paterson. String-matching and other products. Technical report, Massachusetts Institute of Technology, 1974. Google Scholar
  22. Kimmo Fredriksson and Szymon Grabowski. Efficient algorithms for (δ,γ,α) and (δ, k_δ, α)-matching. International Journal of Foundations of Computer Science, 19(01):163-183, 2008. URL: https://doi.org/10.1142/S0129054108005607.
  23. Carl Friedrich Gauss. Untersuchungen über höhere Arithmetik. (Disquisitiones arithmeticae. Theorematis arithmetici demonstratio nova. Summatio quarundam serierum singularium ó.). Deutsch hrsg. von H. Mas, Berlin, 1889. Google Scholar
  24. Paweł Gawrychowski and Przemysław Uznański. Towards unified approximate pattern matching for Hamming and L₁ distance. In Procedings of the International Colloquium on Automata, Languages and Programming, ICALP, pages 62:1-62:13, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.62.
  25. Loukas Georgiadis, Daniel Graf, Giuseppe F. Italiano, Nikos Parotsidis, and Przemyslaw Uznanski. All-pairs 2-reachability in O(n^w log n) time. In Proceedings of the 44th International Colloquium on Automata, Languages, and Programming, ICALP, pages 74:1-74:14, 2017. URL: https://doi.org/10.4230/LIPIcs.ICALP.2017.74.
  26. Jan Holub, William F. Smyth, and Shu Wang. Fast pattern-matching on indeterminate strings. J. of Discrete Algorithms, 6(1):37-50, March 2008. URL: https://doi.org/10.1016/j.jda.2006.10.003.
  27. Piotr Indyk. Deterministic superimposed coding with applications to pattern matching. In Proceedings of the Annual Symposium on Foundations of Computer Science, FOCS, pages 127-136, 1997. URL: https://doi.org/10.1109/SFCS.1997.646101.
  28. Piotr Indyk. Faster algorithms for string matching problems: Matching the convolution bound. In Proceedings of the Annual Symposium on Foundations of Computer Science, FOCS, pages 166-173, 1998. URL: https://doi.org/10.1109/SFCS.1998.743440.
  29. Adam Kalai. Efficient pattern-matching with don't cares. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 655-656, 2002. Google Scholar
  30. William Kautz and Richard Singleton. Nonrandom binary superimposed codes. IEEE Trans. Inf. Theor., 10(4):363-377, September 2006. URL: https://doi.org/10.1109/TIT.1964.1053689.
  31. Tsvi Kopelowitz and Ely Porat. A simple algorithm for approximating the text-to-pattern Hamming distance. In Proceedings of the SIAM Symposium on Simplicity in Algorithms, volume 61 of OASICS, pages 10:1-10:5, 2018. URL: https://doi.org/10.4230/OASIcs.SOSA.2018.10.
  32. Kasper Green Larsen. Constructive discrepancy minimization with hereditary L2 guarantees. In Proceedings of the International Symposium on Theoretical Aspects of Computer Science, STACS, pages 48:1-48:13, 2019. URL: https://doi.org/10.4230/LIPIcs.STACS.2019.48.
  33. Shachar Lovett and Raghu Meka. Constructive discrepancy minimization by walking on the edges. SIAM Journal on Computing, 44(5):1573-1582, 2015. URL: https://doi.org/10.1137/130929400.
  34. Shan Muthukrishnan. New results and open problems related to non-standard stringology. In Proceedings of the Annual Symposium on Combinatorial Pattern Matching, CPM, pages 298-317, 1995. URL: https://doi.org/10.1007/3-540-60044-2_50.
  35. Shan Muthukrishnan and Krishna Palem. Non-standard stringology: Algorithms and complexity. In Proceedings of the Annual ACM Symposium on Theory of Computing, STOC, pages 770-779. ACM, 1994. URL: https://doi.org/10.1145/195058.195457.
  36. Shan Muthukrishnan and Hariharan Ramesh. String matching under a general matching relation. Information and Computation, 122(1):140-148, 1995. URL: https://doi.org/10.1007/3-540-56287-7_118.
  37. Gonzalo Navarro. NR-grep: A fast and flexible pattern-matching tool. Softw. Pract. Exper., 31(13):1265-1312, October 2001. URL: https://doi.org/10.1002/spe.411.
  38. Peng Zhang and Mikhail J. Atallah. On approximate pattern matching with thresholds. Information Processing Letters, 123:21-26, 2017. URL: https://doi.org/10.1016/j.ipl.2017.03.001.