License: Creative Commons Attribution 3.0 Unported license (CC-BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.SEA.2018.17
URN: urn:nbn:de:0030-drops-89528
URL: https://drops.dagstuhl.de/opus/volltexte/2018/8952/
Go to the corresponding LIPIcs Volume Portal


Belazzougui, Djamal ; Cunial, Fabio ; Denas, Olgert

Fast matching statistics in small space

pdf-format:
LIPIcs-SEA-2018-17.pdf (0.5 MB)


Abstract

Computing the matching statistics of a string S with respect to a string T on an alphabet of size sigma is a fundamental primitive for a number of large-scale string analysis applications, including the comparison of entire genomes, for which space is a pressing issue. This paper takes from theory to practice an existing algorithm that uses just O(|T|log{sigma}) bits of space, and that computes a compact encoding of the matching statistics array in O(|S|log{sigma}) time. The techniques used to speed up the algorithm are of general interest, since they optimize queries on the existence of a Weiner link from a node of the suffix tree, and parent operations after unsuccessful Weiner links. Thus, they can be applied to other matching statistics algorithms, as well as to any suffix tree traversal that relies on such calls. Some of our optimizations yield a matching statistics implementation that is up to three times faster than a plain version of the algorithm, depending on the similarity between S and T. In genomic datasets of practical significance we achieve speedups of up to 1.8, but our fastest implementations take on average twice the time of an existing code based on the LCP array. The key advantage is that our implementations need between one half and one fifth of the competitor's memory, and they approach comparable running times when S and T are very similar.

BibTeX - Entry

@InProceedings{belazzougui_et_al:LIPIcs:2018:8952,
  author =	{Djamal Belazzougui and Fabio Cunial and Olgert Denas},
  title =	{{Fast matching statistics in small space}},
  booktitle =	{17th International Symposium on Experimental Algorithms  (SEA 2018)},
  pages =	{17:1--17:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-070-5},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{103},
  editor =	{Gianlorenzo D'Angelo},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2018/8952},
  URN =		{urn:nbn:de:0030-drops-89528},
  doi =		{10.4230/LIPIcs.SEA.2018.17},
  annote =	{Keywords: Matching statistics, maximal repeat, Burrows-Wheeler transform, wavelet tree, suffix tree topology}
}

Keywords: Matching statistics, maximal repeat, Burrows-Wheeler transform, wavelet tree, suffix tree topology
Collection: 17th International Symposium on Experimental Algorithms (SEA 2018)
Issue Date: 2018
Date of publication: 19.06.2018
Supplementary Material: Source code at https://github.com/odenas/indexed_ms.


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI