Document Open Access Logo

An Almost Optimal Edit Distance Oracle

Authors Panagiotis Charalampopoulos , Paweł Gawrychowski , Shay Mozes , Oren Weimann



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2021.48.pdf
  • Filesize: 1.1 MB
  • 20 pages

Document Identifiers

Author Details

Panagiotis Charalampopoulos
  • The Interdisciplinary Center Herzliya, Israel
Paweł Gawrychowski
  • University of Wrocław, Poland
Shay Mozes
  • The Interdisciplinary Center Herzliya, Israel
Oren Weimann
  • University of Haifa, Israel

Cite AsGet BibTex

Panagiotis Charalampopoulos, Paweł Gawrychowski, Shay Mozes, and Oren Weimann. An Almost Optimal Edit Distance Oracle. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 48:1-48:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICALP.2021.48

Abstract

We consider the problem of preprocessing two strings S and T, of lengths m and n, respectively, in order to be able to efficiently answer the following queries: Given positions i,j in S and positions a,b in T, return the optimal alignment score of S[i..j] and T[a..b]. Let N = mn. We present an oracle with preprocessing time N^{1+o(1)} and space N^{1+o(1)} that answers queries in log^{2+o(1)}N time. In other words, we show that we can efficiently query for the alignment score of every pair of substrings after preprocessing the input for almost the same time it takes to compute just the alignment of S and T. Our oracle uses ideas from our distance oracle for planar graphs [STOC 2019] and exploits the special structure of the alignment graph. Conditioned on popular hardness conjectures, this result is optimal up to subpolynomial factors. Our results apply to both edit distance and longest common subsequence (LCS). The best previously known oracle with construction time and size 𝒪(N) has slow Ω(√N) query time [Sakai, TCS 2019], and the one with size N^{1+o(1)} and query time log^{2+o(1)}N (using a planar graph distance oracle) has slow Ω(N^{3/2}) construction time [Long & Pettie, SODA 2021]. We improve both approaches by roughly a √ N factor.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Shortest paths
Keywords
  • longest common subsequence
  • edit distance
  • planar graphs
  • Voronoi diagrams

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS, pages 59-78, 2015. Google Scholar
  2. Amir Abboud, Pawel Gawrychowski, Shay Mozes, and Oren Weimann. Near-optimal compression for the planar graph metric. In 29th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, pages 530-549. SIAM, 2018. Google Scholar
  3. Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, and Jakub Radoszewski. Dynamic and internal longest common substring. Algorithmica, 82(12):3707-3743, 2020. Google Scholar
  4. Maxim Babenko, Paweł Gawrychowski, Tomasz Kociumaka, and Tatiana Starikovskaya. Wavelet trees meet suffix trees. In 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pages 572-591, 2015. Google Scholar
  5. Maxim A. Babenko, Pawel Gawrychowski, Tomasz Kociumaka, Ignat I. Kolesnichenko, and Tatiana Starikovskaya. Computing minimal and maximal suffixes of a substring. Theoretical Computer Science, 638:112-121, 2016. Google Scholar
  6. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM Journal on Computing, 47(3):1087-1097, 2018. Google Scholar
  7. Jon Louis Bentley. Decomposable searching problems. Information Processing Letters, 8(5):244-251, 1979. Google Scholar
  8. Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In 56th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2015, pages 79-97, 2015. Google Scholar
  9. Gerth Stølting Brodal. Partially persistent data structures of bounded degree with constant update time. Nordic Journal of Computing, 3(3):238-255, 1996. Google Scholar
  10. Sergio Cabello. Subquadratic algorithms for the diameter and the sum of pairwise distances in planar graphs. ACM Transactions on Algorithms, 15(2):21:1-21:38, 2019. Google Scholar
  11. Sergio Cabello, Erin W. Chambers, and Jeff Erickson. Multiple-source shortest paths in embedded graphs. SIAM Journal on Computing, 42(4):1542-1571, 2013. Google Scholar
  12. Panagiotis Charalampopoulos. Data Structures for Strings in the Internal and Dynamic Settings. PhD thesis, King’s College London, 2020. Google Scholar
  13. Panagiotis Charalampopoulos, Paweł Gawrychowski, Shay Mozes, and Oren Weimann. Almost optimal distance oracles for planar graphs. In 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, pages 138-151, 2019. Google Scholar
  14. Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, and Wiktor Zuba. Counting distinct patterns in internal dictionary matching. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, pages 8:1-8:15, 2020. Google Scholar
  15. Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal dictionary matching. In 30th International Symposium on Algorithms and Computation, ISAAC 2019, pages 22:1-22:17, 2019. Google Scholar
  16. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Shay Mozes. Dynamic string alignment. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, pages 9:1-9:13, 2020. Google Scholar
  17. Panagiotis Charalampopoulos, Tomasz Kociumaka, and Philip Wellnitz. Faster approximate pattern matching: A unified approach. In 61st Annual IEEE Symposium on Foundations of Computer Science, FOCS 2020, pages 978-989, 2020. Google Scholar
  18. Hagai Cohen and Ely Porat. On the hardness of distance oracle for sparse graph. CoRR, 2010. URL: http://arxiv.org/abs/1006.1117.
  19. Vincent Cohen-Addad, Søren Dahlgaard, and Christian Wulff-Nilsen. Fast and compact exact distance oracle for planar graphs. In 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, pages 962-973, 2017. Google Scholar
  20. Jittat Fakcharoenphol and Satish Rao. Planar graphs, negative weight edges, shortest paths, and near linear time. Journal of Computer and System Sciences, 72(5):868-889, 2006. Google Scholar
  21. Paweł Gawrychowski, Shay Mozes, Oren Weimann, and Christian Wulff-Nilsen. Better tradeoffs for exact distance oracles in planar graphs. In 29th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, pages 515-529, 2018. Google Scholar
  22. Isaac Goldstein, Tsvi Kopelowitz, Moshe Lewenstein, and Ely Porat. Conditional lower bounds for space/time tradeoffs. In 15th International Symposium Algorithms and Data Structures, WADS 2017, pages 421-436, 2017. Google Scholar
  23. Haim Kaplan, Shay Mozes, Yahav Nussbaum, and Micha Sharir. Submatrix maximum queries in Monge matrices and partial Monge matrices, and their applications. ACM Transactions on Algorithms, 13(2):26:1-26:42, 2017. Google Scholar
  24. Orgad Keller, Tsvi Kopelowitz, Shir Landau Feibish, and Moshe Lewenstein. Generalized substring compression. Theoretical Computer Science, 525:42-54, 2014. Google Scholar
  25. Philip N. Klein. Multiple-source shortest paths in planar graphs. In 16th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2005, pages 146-155, 2005. Google Scholar
  26. Tomasz Kociumaka. Minimal suffix and rotation of a substring in optimal time. In 27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, pages 28:1-28:12, 2016. Google Scholar
  27. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Efficient data structures for the factor periodicity problem. In 19th International Symposium on String Processing and Information Retrieval, SPIRE 2012, pages 284-294, 2012. Google Scholar
  28. Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. Internal pattern matching queries in a text and applications. In 26th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pages 532-551, 2015. Google Scholar
  29. Gad M. Landau and Uzi Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms, 10(2):157-169, 1989. URL: https://doi.org/10.1016/0196-6774(89)90010-2.
  30. Yaowei Long and Seth Pettie. Planar distance oracles with better time-space tradeoffs. In 32nd Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, pages 2517-2536, 2021. Google Scholar
  31. Shay Mozes and Christian Wulff-Nilsen. Shortest paths in planar graphs with real lengths in O(n log² n / log log n) time. In 18th Annual European Symposium on Algorithms, ESA 2010, pages 206-217, 2010. Google Scholar
  32. Mihai Pǎtraşcu and Liam Roditty. Distance oracles beyond the Thorup-Zwick bound. SIAM Journal on Computing, 43(1):300-311, 2014. Google Scholar
  33. Mikhail Rubinchik and Arseny M. Shur. Counting palindromes in substrings. In 24th International Symposium on String Processing and Information Retrieval, SPIRE 2017, pages 290-303, 2017. Google Scholar
  34. Yoshifumi Sakai. A substring-substring LCS data structure. Theoretical Computer Science, 753:16-34, 2019. Google Scholar
  35. Jeanette P. Schmidt. All highest scoring paths in weighted grid graphs and their application to finding all approximate repeats in strings. SIAM Journal on Computing, 27(4):972-992, 1998. Google Scholar
  36. Alexander Tiskin. Semi-local string comparison: algorithmic techniques and applications, 2007. URL: http://arxiv.org/abs/0707.3619.
  37. Alexander Tiskin. Semi-local longest common subsequences in subquadratic time. Journal of Discrete Algorithms, 6(4):570-581, 2008. Google Scholar
  38. Peter van Emde Boas. Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters, 6(3):80-82, 1977. Google Scholar
  39. Dan E. Willard. Log-logarithmic worst-case range queries are possible in space Θ(N). Information Processing Letters, 17(2):81-84, 1983. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail