On the Complexity of String Matching for Graphs

Authors Massimo Equi, Roberto Grossi, Veli Mäkinen, Alexandru I. Tomescu



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2019.55.pdf
  • Filesize: 0.68 MB
  • 15 pages

Document Identifiers

Author Details

Massimo Equi
  • Department of Computer Science, University of Helsinki, Finland
Roberto Grossi
  • Dipartimento di Informatica, Università di Pisa, Italy
Veli Mäkinen
  • Department of Computer Science, University of Helsinki, Finland
Alexandru I. Tomescu
  • Department of Computer Science, University of Helsinki, Finland

Cite AsGet BibTex

Massimo Equi, Roberto Grossi, Veli Mäkinen, and Alexandru I. Tomescu. On the Complexity of String Matching for Graphs. In 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 132, pp. 55:1-55:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ICALP.2019.55

Abstract

Exact string matching in labeled graphs is the problem of searching paths of a graph G=(V,E) such that the concatenation of their node labels is equal to the given pattern string P[1..m]. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, of query operations in graph databases, and of analysis operations in heterogeneous networks. We prove a conditional lower bound stating that, for any constant epsilon>0, an O(|E|^{1 - epsilon} m)-time, or an O(|E| m^{1 - epsilon})-time algorithm for exact string matching in graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false. This holds even if restricted to undirected graphs with maximum node degree two, i.e. to zig-zag matching in bidirectional strings, or to deterministic directed acyclic graphs whose nodes have maximum sum of indegree and outdegree three. These restricted cases make the lower bound stricter than what can be directly derived from related bounds on regular expression matching (Backurs and Indyk, FOCS'16). In fact, our bounds are tight in the sense that lowering the degree or the alphabet size yields linear-time solvable problems. An interesting corollary is that exact and approximate matching are equally hard (quadratic time) in graphs under SETH. In comparison, the same problems restricted to strings have linear-time vs quadratic-time solutions, respectively (approximate pattern matching having also a matching SETH lower bound (Backurs and Indyk, STOC'15)).

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • exact pattern matching
  • graph query
  • graph search
  • labeled graphs
  • string matching
  • string search
  • strong exponential time hypothesis
  • heterogeneous networks
  • variation graphs

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight Hardness Results for LCS and Other Sequence Similarity Measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59-78, 2015. Google Scholar
  2. Tatsuya Akutsu. A linear time pattern matching algorithm between a string and a tree. In 4th Symposium on Combinatorial Pattern Matching, Padova, Italy, pages 1-10, 1993. Google Scholar
  3. Jarno Alanko, Alberto Policriti, and Nicola Prezza. On Prefix-Sorting Finite Automata. arXiv e-prints, page arXiv:1902.01088, February 2019. URL: http://arxiv.org/abs/1902.01088.
  4. Amihood Amir, Moshe Lewenstein, and Noa Lewenstein. Pattern matching in hypertext. In WADS'97, Halifax, LNCS 1272, pages 160-173, 1997. Google Scholar
  5. Amihood Amir, Moshe Lewenstein, and Noa Lewenstein. Pattern Matching in Hypertext. J. Algorithms, 35(1):82-99, 2000. Google Scholar
  6. Renzo Angles and Claudio Gutierrez. Survey of Graph Database Models. ACM Comput. Surv., 40(1):1:1-1:39, February 2008. URL: http://dx.doi.org/10.1145/1322432.1322433.
  7. Arturs Backurs and Piotr Indyk. Edit Distance Cannot Be Computed in Strongly Subquadratic Time (Unless SETH is False). In Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, STOC '15, pages 51-58, New York, NY, USA, 2015. ACM. URL: http://dx.doi.org/10.1145/2746539.2746612.
  8. Arturs Backurs and Piotr Indyk. Which Regular Expression Patterns Are Hard to Match? In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 457-466, 2016. Google Scholar
  9. Arturs Backurs and Christos Tzamos. Improving Viterbi is Hard: Better Runtimes Imply Faster Clique Algorithms. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70, pages 311-321. PMLR, 2017. Google Scholar
  10. Karl Bringmann and Marvin Kunnemann. Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping. In Proceedings of the 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), FOCS '15, pages 79-97, Washington, DC, USA, 2015. IEEE Computer Society. URL: http://dx.doi.org/10.1109/FOCS.2015.15.
  11. The Computational Pan-Genomics Consortium. Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, 19(1):118-135, 2018. URL: http://dx.doi.org/10.1093/bib/bbw089.
  12. Alessio Conte, Gaspare Ferraro, Roberto Grossi, Andrea Marino, Kunihiko Sadakane, and Takeaki Uno. Node Similarity with q -Grams for Real-World Labeled Networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, pages 1282-1291, 2018. URL: http://dx.doi.org/10.1145/3219819.3220085.
  13. Mateus de Oliveira Oliveira and Michael Wehar. Intersection Non-emptiness and Hardness Within Polynomial Time. In DLT, volume 11088 of Lecture Notes in Computer Science, pages 282-290. Springer, 2018. Google Scholar
  14. Massimo Equi, Roberto Grossi, and Veli Mäkinen. On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree. arXiv e-prints, page arXiv:1901.05264, January 2019. URL: http://arxiv.org/abs/1901.05264.
  15. Massimo Equi, Roberto Grossi, Alexandru I. Tomescu, and Veli Mäkinen. On the Complexity of Exact Pattern Matching in Graphs: Determinism and Zig-Zag Matching. arXiv e-prints, page arXiv:1902.03560, February 2019. URL: http://arxiv.org/abs/1902.03560.
  16. Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. Cypher: An Evolving Query Language for Property Graphs. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pages 1433-1445, 2018. URL: http://dx.doi.org/10.1145/3183713.3190657.
  17. Travis Gagie, Giovanni Manzini, and Jouni Sirén. Wheeler graphs: A framework for BWT-based data structures. Theor. Comput. Sci., 698:67-78, 2017. URL: http://dx.doi.org/10.1016/j.tcs.2017.06.016.
  18. Daniel Gibney and Sharma V. Thankachan. On the Hardness and Inapproximability of Recognizing Wheeler Graphs. arXiv e-prints, page arXiv:1902.01960, February 2019. URL: http://arxiv.org/abs/1902.01960.
  19. Shohei Hido and Hisashi Kashima. A Linear-Time Graph Kernel. In Wei Wang 0010, Hillol Kargupta, Sanjay Ranka, Philip S. Yu, and Xindong Wu, editors, ICDM 2009, The Ninth IEEE International Conference on Data Mining, Miami, Florida, USA, 6-9 December 2009, pages 179-188. IEEE Computer Society, 2009. Google Scholar
  20. Russell Impagliazzo and Ramamohan Paturi. On the Complexity of k-SAT. Journal of Computer and System Sciences, 62(2):367-375, 2001. URL: http://dx.doi.org/10.1006/jcss.2000.1727.
  21. Chirag Jain, Haowen Zhang, Yu Gao, and Srinivas Aluru. On the Complexity of Sequence to Graph Alignment. bioRxiv, 2019. To appear in RECOMB 2019. URL: http://dx.doi.org/10.1101/522912.
  22. Donald E. Knuth, James H. Morris, and Vaughan R. Pratt. Fast Pattern Matching in Strings. SIAM Journal on Computing, 6(2):323-350, March 1977. Google Scholar
  23. Antoine Limasset, Bastien Cazaux, Eric Rivals, and Pierre Peterlongo. Read mapping on de Bruijn graphs. BMC Bioinformatics, 17:237, 2016. URL: http://dx.doi.org/10.1186/s12859-016-1103-9.
  24. U. Manber and S. Wu. Approximate string matching with arbitrary costs for text and hypertext. In IAPR Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland, pages 22-33, 1992. Google Scholar
  25. Gonzalo Navarro. Improved approximate pattern matching on hypertext. Theoretical Computer Science, 237(1-2):455-463, 2000. Google Scholar
  26. K. Park and D. Kim. String matching in hypertext. In 6th Symposium on Combinatorial Pattern Matching, Espoo, Finland, page 318, 1995. Google Scholar
  27. Eric Prud'hommeaux and Andy Seaborne. SPARQL query language for RDF. World Wide Web Consortium, Recommendation REC-rdf-sparql-query-20080115, January 2008. Google Scholar
  28. M. O. Rabin and D. Scott. Finite Automata and Their Decision Problems. IBM Journal of Research and Development, 3(2):114-125, April 1959. URL: http://dx.doi.org/10.1147/rd.32.0114.
  29. Mikko Rautiainen and Tobias Marschall. Aligning sequences to general graphs in O(V+ mE) time. bioRxiv, pages 216-127, 2017. Google Scholar
  30. Marko A. Rodriguez. The Gremlin graph traversal machine and language (invited talk). In Proceedings of the 15th Symposium on Database Programming Languages, Pittsburgh, PA, USA, October 25-30, 2015, pages 1-10, 2015. URL: http://dx.doi.org/10.1145/2815072.2815073.
  31. Jouni Sirén, Niko Välimäki, and Veli Mäkinen. Indexing Graphs for Path Queries with Applications in Genome Research. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 11(2):375-388, March 2014. URL: http://dx.doi.org/10.1109/TCBB.2013.2297101.
  32. Chris Thachuk. Indexing hypertext. Journal of Discrete Algorithms, 18:113-122, 2013. Selected papers from the 18th International Symposium on String Processing and Information Retrieval (SPIRE 2011). URL: http://dx.doi.org/10.1016/j.jda.2012.10.001.
  33. Michael Wehar. Hardness Results for Intersection Non-Emptiness. In ICALP (2), volume 8573 of Lecture Notes in Computer Science, pages 354-362. Springer, 2014. Google Scholar
  34. Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theoretical Computer Science, 348(2):357-365, 2005. URL: http://dx.doi.org/10.1016/j.tcs.2005.09.023.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail