Matching Patterns with Variables Under Hamming Distance

Authors Paweł Gawrychowski , Florin Manea , Stefan Siemer



PDF
Thumbnail PDF

File

LIPIcs.MFCS.2021.48.pdf
  • Filesize: 0.86 MB
  • 24 pages

Document Identifiers

Author Details

Paweł Gawrychowski
  • Faculty of Mathematics and Computer Science, University of Wrocław, Poland
Florin Manea
  • Computer Science Department and Campus-Institut Data Science, Göttingen University, Germany
Stefan Siemer
  • Computer Science Department, Göttingen University, Germany

Cite AsGet BibTex

Paweł Gawrychowski, Florin Manea, and Stefan Siemer. Matching Patterns with Variables Under Hamming Distance. In 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 202, pp. 48:1-48:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.MFCS.2021.48

Abstract

A pattern α is a string of variables and terminal letters. We say that α matches a word w, consisting only of terminal letters, if w can be obtained by replacing the variables of α by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. In this paper, we approach this problem in a generalized setting, by considering approximate pattern matching under Hamming distance. More precisely, we are interested in what is the minimum Hamming distance between w and any word u obtained by replacing the variables of α by terminal words. Firstly, we address the class of regular patterns (in which no variable occurs twice) and propose efficient algorithms for this problem, as well as matching conditional lower bounds. We show that the problem can still be solved efficiently if we allow repeated variables, but restrict the way the different variables can be interleaved according to a locality parameter. However, as soon as we allow a variable to occur more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if none of them occurs more than once, the problem becomes intractable.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Formal languages and automata theory
Keywords
  • Pattern with variables
  • Matching algorithms
  • Hamming distance
  • Conditional lower bounds
  • Patterns with structural restrictions

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amihood Amir, Moshe Lewenstein, and Ely Porat. Faster algorithms for string matching with k mismatches. J. Algorithms, 50(2):257-275, 2004. URL: https://doi.org/10.1016/S0196-6774(03)00097-X.
  2. Amihood Amir and Igor Nor. Generalized function matching. J. Discrete Algorithms, 5:514-523, 2007. URL: https://doi.org/10.1016/j.jda.2006.10.001.
  3. Dana Angluin. Finding patterns common to a set of strings. J. Comput. Syst. Sci., 21(1):46-62, 1980. URL: https://doi.org/10.1016/0022-0000(80)90041-0.
  4. Arturs Backurs and Piotr Indyk. Which regular expression patterns are hard to match? In Proc. 57th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2016, pages 457-466, 2016. URL: https://doi.org/10.1109/FOCS.2016.56.
  5. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J. Comput., 47(3):1087-1097, 2018. URL: https://doi.org/10.1145/2746539.2746612.
  6. Philip Bille and Martin Farach-Colton. Fast and compact regular expression matching. Theor. Comput. Sci., 409(3):486-496, 2008. URL: https://doi.org/10.1016/j.tcs.2008.08.042.
  7. Christina Boucher, Christine Lo, and Daniel Lokshantov. Consensus patterns (probably) has no EPTAS. In Proc. 23rd Annual European Symposium, ESA, volume 9294 of Lecture Notes in Computer Science, pages 239-250, 2015. URL: https://doi.org/10.1007/978-3-662-48350-3_21.
  8. Brona Brejová, Daniel G. Brown, Ian M. Harrower, Alejandro López-Ortiz, and Tomás Vinar. Sharper upper and lower bounds for an approximation scheme for consensus-pattern. In Proc. 16th Annual Symposium Combinatorial Pattern Matching, CPM 2005, volume 3537 of Lecture Notes in Computer Science, pages 1-10, 2005. URL: https://doi.org/10.1007/11496656_1.
  9. Brona Brejová, Daniel G. Brown, Ian M. Harrower, and Tomás Vinar. New bounds for motif finding in strong instances. In Proc. 17th Annual Symposium Combinatorial Pattern Matching, CPM 2006, volume 4009 of Lecture Notes in Computer Science, pages 94-105, 2006. URL: https://doi.org/10.1007/11780441_10.
  10. Karl Bringmann. Fine-grained complexity theory (tutorial). In Proc. 36th International Symposium on Theoretical Aspects of Computer Science, STACS 2019, volume 126 of LIPIcs, pages 4:1-4:7, 2019. URL: https://doi.org/10.4230/LIPIcs.STACS.2019.4.
  11. Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In Proc. 56th IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 79-97, 2015. URL: https://doi.org/10.1109/FOCS.2015.15.
  12. Karl Bringmann and Marvin Künnemann. Multivariate fine-grained complexity of longest common subsequence. In Proc. 29th ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, pages 1216-1235. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.79.
  13. Laurent Bulteau and Markus L. Schmid. Consensus strings with small maximum distance and small distance sum. Algorithmica, 82(5):1378-1409, 2020. URL: https://doi.org/10.1007/s00453-019-00647-9.
  14. Cezar Câmpeanu, Kai Salomaa, and Sheng Yu. A formal study of practical regular expressions. Int. J. Found. Comput. Sci., 14:1007-1018, 2003. URL: https://doi.org/10.1142/S012905410300214X.
  15. Katrin Casel, Joel D. Day, Pamela Fleischmann, Tomasz Kociumaka, Florin Manea, and Markus L. Schmid. Graph and string parameters: Connections between pathwidth, cutwidth and the locality number. In Proc. 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, volume 132 of LIPIcs, pages 109:1-109:16, 2019. URL: https://doi.org/10.4230/LIPIcs.ICALP.2019.109.
  16. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach. In Proc. 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, pages 978-989, 2020. URL: https://doi.org/10.1109/FOCS46700.2020.00095.
  17. Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on strings. Cambridge University Press, 2007. URL: https://doi.org/10.1017/CBO9780511546853.
  18. Joel D. Day, Pamela Fleischmann, Florin Manea, and Dirk Nowotka. Local patterns. In Proc. 37th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2017, volume 93 of LIPIcs, pages 24:1-24:14, 2017. URL: https://doi.org/10.4230/LIPIcs.FSTTCS.2017.24.
  19. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document spanners: A formal approach to information extraction. J. ACM, 62(2):12:1-12:51, 2015. URL: https://doi.org/10.1145/2699442.
  20. Michael R. Fellows, Jens Gramm, and Rolf Niedermeier. On the parameterized intractability of motif search problems. Comb., 26(2):141-167, 2006. URL: https://doi.org/10.1007/s00493-006-0011-4.
  21. Henning Fernau, Florin Manea, Robert Mercas, and Markus L. Schmid. Revisiting Shinohara’s algorithm for computing descriptive patterns. Theor. Comput. Sci., 733:44-54, 2018. URL: https://doi.org/10.1016/j.tcs.2018.04.035.
  22. Henning Fernau, Florin Manea, Robert Mercas, and Markus L. Schmid. Pattern matching with variables: Efficient algorithms and complexity results. ACM Trans. Comput. Theory, 12(1):6:1-6:37, 2020. URL: https://doi.org/10.1145/3369935.
  23. Henning Fernau and Markus L. Schmid. Pattern matching with variables: A multivariate complexity analysis. Inf. Comput., 242:287-305, 2015. URL: https://doi.org/10.1016/j.ic.2015.03.006.
  24. Henning Fernau, Markus L. Schmid, and Yngve Villanger. On the parameterised complexity of string morphism problems. Theory Comput. Syst., 59(1):24-51, 2016. URL: https://doi.org/10.1007/s00224-015-9635-3.
  25. Dominik D. Freydenberger. Extended regular expressions: Succinctness and decidability. Theory of Comput. Syst., 53:159-193, 2013. URL: https://doi.org/10.1007/s00224-012-9389-0.
  26. Dominik D. Freydenberger. A logic for document spanners. Theory Comput. Syst., 63(7):1679-1754, 2019. URL: https://doi.org/10.1007/s00224-018-9874-1.
  27. Dominik D. Freydenberger and Mario Holldack. Document spanners: From expressive power to decision problems. Theory Comput. Syst., 62(4):854-898, 2018. URL: https://doi.org/10.1007/s00224-017-9770-0.
  28. Dominik D. Freydenberger and Markus L. Schmid. Deterministic regular expressions with back-references. J. Comput. Syst. Sci., 105:1-39, 2019. URL: https://doi.org/10.1016/j.jcss.2019.04.001.
  29. Jeffrey E. F. Friedl. Mastering Regular Expressions. O'Reilly, Sebastopol, CA, third edition, 2006. Google Scholar
  30. Pawel Gawrychowski, Florin Manea, and Stefan Siemer. Matching patterns with variables under hamming distance. CoRR, abs/2106.06249, 2021. URL: http://arxiv.org/abs/2106.06249.
  31. Pawel Gawrychowski and Przemyslaw Uznanski. Optimal trade-offs for pattern matching with k mismatches. CoRR, abs/1704.01311, 2017. URL: http://arxiv.org/abs/1704.01311.
  32. Pawel Gawrychowski and Przemyslaw Uznanski. Towards unified approximate pattern matching for hamming and l_1 distance. In Proc. 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, volume 107 of LIPIcs, pages 62:1-62:13, 2018. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.62.
  33. Juha Kärkkäinen and Peter Sanders. Simple linear work suffix array construction. In Proc. 30th International Colloquium Automata, Languages and Programming, ICALP 2003, volume 2719 of Lecture Notes in Computer Science, pages 943-955, 2003. URL: https://doi.org/10.1007/3-540-45061-0_73.
  34. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. J. ACM, 53(6):918-936, 2006. URL: https://doi.org/10.1145/1217856.1217858.
  35. Gad M. Landau and Uzi Vishkin. Efficient string matching in the presence of errors. In Proc. 26th Annual Symposium on Foundations of Computer Science, FOCS 1985, pages 126-136, 1985. URL: https://doi.org/10.1109/SFCS.1985.22.
  36. Ming Li, Bin Ma, and Lusheng Wang. Finding similar regions in many sequences. J. Comput. Syst. Sci., 65(1):73-96, 2002. URL: https://doi.org/10.1006/jcss.2002.1823.
  37. M. Lothaire. Combinatorics on Words. Cambridge University Press, 1997. URL: https://doi.org/10.1017/CBO9780511566097.
  38. M. Lothaire. Algebraic Combinatorics on Words. Cambridge University Press, 2002. URL: https://doi.org/10.1017/CBO9781107326019.
  39. Florin Manea and Markus L. Schmid. Matching patterns with variables. In Proc. 12th International Conference Combinatorics on Words, WORDS 2019, volume 11682 of Lecture Notes in Computer Science, pages 1-27, 2019. URL: https://doi.org/10.1007/978-3-030-28796-2_1.
  40. Dániel Marx. Closest substring problems with small distances. SIAM J. Comput., 38(4):1382-1410, 2008. URL: https://doi.org/10.1137/060673898.
  41. Eugene W. Myers and Webb Miller. Approximate matching of regular expressions. Bull. Math. Biol., 51(1):5-37, 1989. URL: https://doi.org/10.1007/BF02458834.
  42. Daniel Reidenbach and Markus L. Schmid. Patterns with bounded treewidth. Inf. Comput., 239:87-99, 2014. URL: https://doi.org/10.1016/j.ic.2014.08.010.
  43. Markus L. Schmid. A note on the complexity of matching patterns with variables. Inf. Process. Lett., 113(19):729-733, 2013. URL: https://doi.org/10.1016/j.ipl.2013.06.011.
  44. Markus L. Schmid and Nicole Schweikardt. A purely regular approach to non-regular core spanners. In Proc. 24th International Conference on Database Theory, ICDT 2021, volume 186 of LIPIcs, pages 4:1-4:19, 2021. URL: https://doi.org/10.4230/LIPIcs.ICDT.2021.4.
  45. Takeshi Shinohara. Polynomial time inference of pattern languages and its application. In Proc. 7th IBM Symposium on Mathematical Foundations of Computer Science, MFCS, pages 191-209, 1982. Google Scholar
  46. Takeshi Shinohara and Setsuo Arikawa. Pattern inference. In Algorithmic Learning for Knowledge-Based Systems, GOSLER Final Report, volume 961 of LNAI, pages 259-291, 1995. Google Scholar
  47. Przemyslaw Uznanski. Recent advances in text-to-pattern distance algorithms. In Proc. 16th Conference on Computability in Europe, CiE 2020, volume 12098 of Lecture Notes in Computer Science, pages 353-365, 2020. URL: https://doi.org/10.1007/978-3-030-51466-2_32.
  48. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357-365, 2005. URL: https://doi.org/10.1016/j.tcs.2005.09.023.