Document

**Published in:** LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

The Burrows-Wheeler Transform (BWT) has been an essential tool in text compression and indexing. First introduced in 1994, it went on to provide the backbone for the first encoding of the classic suffix tree data structure in space close to entropy-based lower bound. Within the last decade, it has seen its role further enhanced with the development of compact suffix trees in space proportional to "r", the number of runs in the BWT. While r would superficially appear to be only a measure of space complexity, it is actually appearing increasingly often in the time complexity of new algorithms as well. This makes having the smallest value of r of growing importance. Interestingly, unlike other popular measures of compression, the parameter r is sensitive to the lexicographic ordering given to the text’s alphabet. Despite several past attempts to exploit this fact, a provably efficient algorithm for finding, or approximating, an alphabet ordering which minimizes r has been open for years.
We help to explain this lack of progress by presenting the first set of results on the computational complexity of minimizing BWT-runs via alphabet reordering. We prove that the decision version of this problem is NP-complete and cannot be solved in time poly(n)⋅ 2^o(σ) unless the Exponential Time Hypothesis fails, where σ is the size of the alphabet and n is the length of the text. Moreover, we show that the optimization variant is APX-hard. In doing so, we relate two previously disparate topics: the optimal traveling salesperson path of a graph and the number of runs in the BWT of a text. In addition, by relating recent results in the field of dictionary compression, we illustrate that an arbitrary alphabet ordering provides an O(log² n)-approximation. Lastly, we provide an optimal linear-time algorithm for a more restricted problem of finding an optimal ordering on a subset of symbols (occurring only once) under ordering constraints.

Jason W. Bentley, Daniel Gibney, and Sharma V. Thankachan. On the Complexity of BWT-Runs Minimization via Alphabet Reordering. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 15:1-15:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{bentley_et_al:LIPIcs.ESA.2020.15, author = {Bentley, Jason W. and Gibney, Daniel and Thankachan, Sharma V.}, title = {{On the Complexity of BWT-Runs Minimization via Alphabet Reordering}}, booktitle = {28th Annual European Symposium on Algorithms (ESA 2020)}, pages = {15:1--15:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-162-7}, ISSN = {1868-8969}, year = {2020}, volume = {173}, editor = {Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.15}, URN = {urn:nbn:de:0030-drops-128819}, doi = {10.4230/LIPIcs.ESA.2020.15}, annote = {Keywords: BWT, NP-hardness, APX-hardness} }

Document

**Published in:** LIPIcs, Volume 173, 28th Annual European Symposium on Algorithms (ESA 2020)

We present the first fine-grained complexity results on two classic problems on strings. The first one is the k-Median-Edit-Distance problem, where the input is a collection of k strings, each of length at most n, and the task is to find a new string that minimizes the sum of the edit distances from itself to all other strings in the input. Arising frequently in computational biology, this problem provides an important generalization of edit distance to multiple strings and is similar to the multiple sequence alignment problem in bioinformatics. We demonstrate that for any ε > 0 and k ≥ 2, an O(n^{k-ε}) time solution for the k-Median-Edit-Distance problem over an alphabet of size O(k) refutes the Strong Exponential Time Hypothesis (SETH). This provides the first matching conditional lower bound for the O(n^k) time algorithm established in 1975 by Sankoff.
The second problem we study is the k-Center-Edit-Distance problem. Here also, the input is a collection of k strings, each of length at most n. The task is to find a new string that minimizes the maximum edit distance from itself to any other string in the input. We prove that the same conditional lower bound as before holds. Our results also imply new conditional lower bounds for the k-Tree-Alignment and the k-Bottleneck-Tree-Alignment problems studied in phylogenetics.

Gary Hoppenworth, Jason W. Bentley, Daniel Gibney, and Sharma V. Thankachan. The Fine-Grained Complexity of Median and Center String Problems Under Edit Distance. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 61:1-61:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{hoppenworth_et_al:LIPIcs.ESA.2020.61, author = {Hoppenworth, Gary and Bentley, Jason W. and Gibney, Daniel and Thankachan, Sharma V.}, title = {{The Fine-Grained Complexity of Median and Center String Problems Under Edit Distance}}, booktitle = {28th Annual European Symposium on Algorithms (ESA 2020)}, pages = {61:1--61:19}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-162-7}, ISSN = {1868-8969}, year = {2020}, volume = {173}, editor = {Grandoni, Fabrizio and Herman, Grzegorz and Sanders, Peter}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2020.61}, URN = {urn:nbn:de:0030-drops-129278}, doi = {10.4230/LIPIcs.ESA.2020.61}, annote = {Keywords: Edit Distance, Median String, Center String, SETH} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail