Charalampopoulos, Panagiotis ;
Kociumaka, Tomasz ;
Pissis, Solon P. ;
Radoszewski, Jakub
Faster Algorithms for Longest Common Substring
Abstract
In the classic longest common substring (LCS) problem, we are given two strings S and T, each of length at most n, over an alphabet of size Ο, and we are asked to find a longest string occurring as a fragment of both S and T. Weiner, in his seminal paper that introduced the suffix tree, presented an πͺ(n log Ο)time algorithm for this problem [SWAT 1973]. For polynomiallybounded integer alphabets, the lineartime construction of suffix trees by Farach yielded an πͺ(n)time algorithm for the LCS problem [FOCS 1997]. However, for small alphabets, this is not necessarily optimal for the LCS problem in the word RAM model of computation, in which the strings can be stored in πͺ(n log Ο/log n) space and read in πͺ(n log Ο/log n) time. We show that, in this model, we can compute an LCS in time πͺ(n log Ο / β{log n}), which is sublinear in n if Ο = 2^{o(β{log n})} (in particular, if Ο = πͺ(1)), using optimal space πͺ(n log Ο/log n).
We then lift our ideas to the problem of computing a kmismatch LCS, which has received considerable attention in recent years. In this problem, the aim is to compute a longest substring of S that occurs in T with at most k mismatches. Flouri et al. showed how to compute a 1mismatch LCS in πͺ(n log n) time [IPL 2015]. Thankachan et al. extended this result to computing a kmismatch LCS in πͺ(n log^k n) time for k = πͺ(1) [J. Comput. Biol. 2016]. We show an πͺ(n log^{k1/2} n)time algorithm, for any constant integer k > 0 and irrespective of the alphabet size, using πͺ(n) space as the previous approaches. We thus notably break through the wellknown n log^k n barrier, which stems from a recursive heavypath decomposition technique that was first introduced in the seminal paper of Cole et al. [STOC 2004] for string indexing with k errors.
BibTeX  Entry
@InProceedings{charalampopoulos_et_al:LIPIcs.ESA.2021.30,
author = {Charalampopoulos, Panagiotis and Kociumaka, Tomasz and Pissis, Solon P. and Radoszewski, Jakub},
title = {{Faster Algorithms for Longest Common Substring}},
booktitle = {29th Annual European Symposium on Algorithms (ESA 2021)},
pages = {30:130:17},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772044},
ISSN = {18688969},
year = {2021},
volume = {204},
editor = {Mutzel, Petra and Pagh, Rasmus and Herman, Grzegorz},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/14611},
URN = {urn:nbn:de:0030drops146114},
doi = {10.4230/LIPIcs.ESA.2021.30},
annote = {Keywords: longest common substring, k mismatches, wavelet tree}
}
31.08.2021
Keywords: 

longest common substring, k mismatches, wavelet tree 
Seminar: 

29th Annual European Symposium on Algorithms (ESA 2021)

Issue date: 

2021 
Date of publication: 

31.08.2021 