Brubach, Brian ;
Ghurye, Jay
A Succinct Four Russians Speedup for Edit Distance Computation and Oneagainstmany Banded Alignment
Abstract
The classical Four Russians speedup for computing edit distance (a.k.a. Levenshtein distance), due to Masek and Paterson [Masek and Paterson, 1980], involves partitioning the dynamic programming table into kbyk square blocks and generating a lookup table in O(psi^{2k} k^2 Sigma^{2k}) time and O(psi^{2k} k Sigma^{2k}) space for block size k, where psi depends on the cost function (for unit costs psi = 3) and Sigma is the size of the alphabet. We show that the O(psi^{2k} k^2) and O(psi^{2k} k) factors can be improved to O(k^2 lg{k}) time and O(k^2) space. Thus, we improve the time and space complexity of that aspect compared to Masek and Paterson [Masek and Paterson, 1980] and remove the dependence on psi.
We further show that for certain problems the O(Sigma^{2k}) factor can also be reduced. Using this technique, we show a new algorithm for the fundamental problem of oneagainstmany banded alignment. In particular, comparing one string of length m to n other strings of length m with maximum distance d can be performed in O(n m + m d^2 lg{d} + n d^3) time. When d is reasonably small, this approaches or meets the current best theoretic result of O(nm + n d^2) achieved by using the best known pairwise algorithm running in O(m + d^2) time [Myers, 1986][Ukkonen, 1985] while potentially being more practical. It also improves on the standard practical approach which requires O(n m d) time to iteratively run an O(md) time pairwise banded alignment algorithm.
Regarding pairwise comparison, we extend the classic result of Masek and Paterson [Masek and Paterson, 1980] which computes the edit distance between two strings in O(m^2/log{m}) time to remove the dependence on psi even when edits have arbitrary costs from a penalty matrix. Crochemore, Landau, and ZivUkelson [Crochemore, 2003] achieved a similar result, also allowing for unrestricted scoring matrices, but with variablesized blocks. In practical applications of the Four Russians speedup wherein space efficiency is important and smaller block sizes k are used (notably k < Sigma), Kim, Na, Park, and Sim [Kim et al., 2016] showed how to remove the dependence on the alphabet size for the unit cost version, generating a lookup table in O(3^{2k} (2k)! k^2) time and O(3^{2k} (2k)! k) space. Combining their work with our result yields an improvement to O((2k)! k^2 lg{k}) time and O((2k)! k^2) space.
BibTeX  Entry
@InProceedings{brubach_et_al:LIPIcs:2018:8696,
author = {Brian Brubach and Jay Ghurye},
title = {{A Succinct Four Russians Speedup for Edit Distance Computation and Oneagainstmany Banded Alignment}},
booktitle = {Annual Symposium on Combinatorial Pattern Matching (CPM 2018)},
pages = {13:113:12},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770743},
ISSN = {18688969},
year = {2018},
volume = {105},
editor = {Gonzalo Navarro and David Sankoff and Binhai Zhu},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2018/8696},
URN = {urn:nbn:de:0030drops86965},
doi = {10.4230/LIPIcs.CPM.2018.13},
annote = {Keywords: edit distance, banded alignment, oneagainstmany alignment, genomics, method of the Four Russians}
}
18.05.2018
Keywords: 

edit distance, banded alignment, oneagainstmany alignment, genomics, method of the Four Russians 
Seminar: 

Annual Symposium on Combinatorial Pattern Matching (CPM 2018)

Issue date: 

2018 
Date of publication: 

18.05.2018 