Edit Distance with Block Operations

Authors Michal Ganczorz, Pawel Gawrychowski, Artur Jez, Tomasz Kociumaka



PDF
Thumbnail PDF

File

LIPIcs.ESA.2018.33.pdf
  • Filesize: 415 kB
  • 14 pages

Document Identifiers

Author Details

Michal Ganczorz
  • Institute of Computer Science, University of Wrocław, Poland
Pawel Gawrychowski
  • Institute of Computer Science, University of Wrocław, Poland
Artur Jez
  • Institute of Computer Science, University of Wrocław, Poland
Tomasz Kociumaka
  • Institute of Informatics, University of Warsaw, Poland

Cite As Get BibTex

Michal Ganczorz, Pawel Gawrychowski, Artur Jez, and Tomasz Kociumaka. Edit Distance with Block Operations. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 33:1-33:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.ESA.2018.33

Abstract

We consider the problem of edit distance in which block operations are allowed, i.e. we ask for the minimal number of (block) operations that are needed to transform a string s to t. We give O(log n) approximation algorithms, where n is the total length of the input strings, for the variants of the problem which allow the following sets of operations: block move; block move and block delete; block move and block copy; block move, block copy, and block uncopy. The results still hold if we additionally allow any of the following operations: character insert, character delete, block reversal, or block involution (involution is a generalisation of the reversal). Previously, algorithms only for the first and last variant were known, and they had approximation ratios O(log n log^*n) and O(log n (log^*n)^2), respectively. The edit distance with block moves is equivalent, up to a constant factor, to the common string partition problem, in which we are given two strings s, t and the goal is to partition s into minimal number of parts such that they can be permuted in order to obtain t. Thus we also obtain an O(log n) approximation for this problem (compared to the previous O(log n log^* n)).
The results use a simplification of the previously used technique of locally consistent parsing, which groups short substrings of a string into phrases so that similar substrings are guaranteed to be grouped in a similar way. Instead of a sophisticated parsing technique relying on a deterministic coin tossing, we use a simple one based on a partition of the alphabet into two subalphabets. In particular, this lowers the running time from O(n log^* n) to O(n). The new algorithms (for block copy or block delete) use a similar algorithm, but the analysis is based on a specially tuned combinatorial function on sets of numbers.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
Keywords
  • Edit distance
  • Block operations
  • Common string partition

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Stephen Alstrup, Gerth Stølting Brodal, and Theis Rauhe. Pattern matching in dynamic texts. In David B. Shmoys, editor, 11th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2000, pages 819-828. ACM/SIAM, 2000. URL: http://dl.acm.org/citation.cfm?id=338219.338645.
  2. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Rocco A. Servedio and Ronitt Rubinfeld, editors, 47th Annual ACM on Symposium on Theory of Computing, STOC 2015, pages 51-58. ACM, 2015. URL: http://dx.doi.org/10.1145/2746539.2746612.
  3. Laurent Bulteau and Christian Komusiewicz. Minimum common string partition parameterized by partition size is fixed-parameter tractable. In Chandra Chekuri, editor, 25th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, pages 102-121. SIAM, 2014. URL: http://dx.doi.org/10.1137/1.9781611973402.8.
  4. Marek Chrobak, Petr Kolman, and Jirí Sgall. The greedy algorithm for the minimum common string partition problem. ACM Transactions on Algorithms, 1(2):350-366, 2005. URL: http://dx.doi.org/10.1145/1103963.1103971.
  5. Richard Cole and Uzi Vishkin. Deterministic coin tossing with applications to optimal parallel list ranking. Information and Control, 70(1):32-53, 1986. URL: http://dx.doi.org/10.1016/S0019-9958(86)80023-7.
  6. Graham Cormode and S. Muthukrishnan. The string edit distance matching problem with moves. ACM Transactions on Algorithms, 3(1):2:1-2:19, 2007. URL: http://dx.doi.org/10.1145/1219944.1219947.
  7. Graham Cormode, Mike Paterson, Süleyman Cenk Sahinalp, and Uzi Vishkin. Communication complexity of document exchange. In David B. Shmoys, editor, 11th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2000, pages 197-206. ACM/SIAM, 2000. URL: http://dl.acm.org/citation.cfm?id=338219.338252.
  8. Funda Ergün, S. Muthukrishnan, and Süleyman Cenk Sahinalp. Comparing sequences with segment rearrangements. In Paritosh K. Pandya and Jaikumar Radhakrishnan, editors, Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2003, volume 2914 of Lecture Notes in Computer Science, pages 183-194. Springer, 2003. URL: http://dx.doi.org/10.1007/978-3-540-24597-1_16.
  9. Paweł Gawrychowski, Adam Karczmarz, Tomasz Kociumaka, Jakub Łącki, and Piotr Sankowski. Optimal dynamic strings. In Artur Czumaj, editor, 29th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, pages 1509-1528. SIAM, 2018. URL: http://dx.doi.org/10.1137/1.9781611975031.99.
  10. Avraham Goldstein, Petr Kolman, and Jie Zheng. Minimum common string partition problem: Hardness and approximations. Electronic Journal of Combinatorics, 12, 2005. URL: http://www.combinatorics.org/Volume_12/Abstracts/v12i1r50.html.
  11. Artur Jeż. Approximation of grammar-based compression via recompression. Theoretical Computer Science, 592:115-134, 2015. URL: http://dx.doi.org/10.1016/j.tcs.2015.05.027.
  12. Artur Jeż. Faster fully compressed pattern matching by recompression. ACM Transactions on Algorithms, 11(3):20:1-20:43, 2015. URL: http://dx.doi.org/10.1145/2631920.
  13. Daniel P. Lopresti and Andrew Tomkins. Block edit models for approximate string matching. Theoretical Computer Science, 181(1):159-179, 1997. URL: http://dx.doi.org/10.1016/S0304-3975(96)00268-X.
  14. Kurt Mehlhorn, R. Sundar, and Christian Uhrig. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica, 17(2):183-198, 1997. URL: http://dx.doi.org/10.1007/BF02522825.
  15. S. Muthukrishnan and Süleyman Cenk Sahinalp. Approximate nearest neighbors and sequence comparison with block operations. In F. Frances Yao and Eugene M. Luks, editors, 32nd Annual ACM Symposium on Theory of Computing, STOC 2000, pages 416-424. ACM, 2000. URL: http://dx.doi.org/10.1145/335305.335353.
  16. S. Muthukrishnan and Süleyman Cenk Sahinalp. Simple and practical sequence nearest neighbors with block operations. In Alberto Apostolico and Masayuki Takeda, editors, Combinatorial Pattern Matching, CPM 2002, volume 2373 of LNCS, pages 262-278. Springer, 2002. URL: http://dx.doi.org/10.1007/3-540-45452-7_22.
  17. Süleyman Cenk Sahinalp and Uzi Vishkin. On a parallel-algorithms method for string matching problems. In Maurizio A. Bonuccelli, Pierluigi Crescenzi, and Rossella Petreschi, editors, Algorithms and Complexity, CIAC 1994, volume 778 of LNCS, pages 22-32. Springer, 1994. URL: http://dx.doi.org/10.1007/3-540-57811-0_3.
  18. Süleyman Cenk Sahinalp and Uzi Vishkin. Symmetry breaking for suffix tree construction. In Frank Thomson Leighton and Michael T. Goodrich, editors, 26th Annual ACM Symposium on Theory of Computing, STOC 1994, pages 300-309. ACM, 1994. URL: http://dx.doi.org/10.1145/195058.195164.
  19. Dana Shapira and James A. Storer. Edit distance with move operations. Journal of Discrete Algorithms, 5(2):380-392, 2007. URL: http://dx.doi.org/10.1016/j.jda.2005.01.010.
  20. Dana Shapira and James A. Storer. Edit distance with block deletions. Algorithms, 4(1):40-60, 2011. URL: http://dx.doi.org/10.3390/a4010040.
  21. Esko Ukkonen. Algorithms for approximate string matching. Information and Control, 64(1-3):100-118, 1985. URL: http://dx.doi.org/10.1016/S0019-9958(85)80046-2.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail