DROPS

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2024.20

Better Space-Time-Robustness Trade-Offs for Set Reconciliation

Authors: Djamal Belazzougui, Gregory Kucherov, and Stefan Walzer

Published in: LIPIcs, Volume 297, 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)

Abstract

We consider the problem of reconstructing the symmetric difference between similar sets from their representations (sketches) of size linear in the number of differences. Exact solutions to this problem are based on error-correcting coding techniques and suffer from a large decoding time. Existing probabilistic solutions based on Invertible Bloom Lookup Tables (IBLTs) are time-efficient but offer insufficient success guarantees for many applications. Here we propose a tunable trade-off between the two approaches combining the efficiency of IBLTs with exponentially decreasing failure probability. The proof relies on a refined analysis of IBLTs proposed in (Bæk Tejs Houen et al. SOSA 2023) which has an independent interest. We also propose a modification of our algorithm that enables telling apart the elements of each set in the symmetric difference.

Cite as

Djamal Belazzougui, Gregory Kucherov, and Stefan Walzer. Better Space-Time-Robustness Trade-Offs for Set Reconciliation. In 51st International Colloquium on Automata, Languages, and Programming (ICALP 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 297, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.ICALP.2024.20,
  author =	{Belazzougui, Djamal and Kucherov, Gregory and Walzer, Stefan},
  title =	{{Better Space-Time-Robustness Trade-Offs for Set Reconciliation}},
  booktitle =	{51st International Colloquium on Automata, Languages, and Programming (ICALP 2024)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-322-5},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{297},
  editor =	{Bringmann, Karl and Grohe, Martin and Puppis, Gabriele and Svensson, Ola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2024.20},
  URN =		{urn:nbn:de:0030-drops-201639},
  doi =		{10.4230/LIPIcs.ICALP.2024.20},
  annote =	{Keywords: data structures, hashing, set reconciliation, invertible Bloom lookup tables, random hypergraphs, BCH codes}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.WABI.2023

LIPIcs, Volume 273, WABI 2023, Complete Volume

Authors: Djamal Belazzougui and Aïda Ouangraoua

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

LIPIcs, Volume 273, WABI 2023, Complete Volume

Cite as

23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 1-400, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@Proceedings{belazzougui_et_al:LIPIcs.WABI.2023,
  title =	{{LIPIcs, Volume 273, WABI 2023, Complete Volume}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{1--400},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023},
  URN =		{urn:nbn:de:0030-drops-186250},
  doi =		{10.4230/LIPIcs.WABI.2023},
  annote =	{Keywords: LIPIcs, Volume 273, WABI 2023, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.WABI.2023.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Djamal Belazzougui and Aïda Ouangraoua

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 0:i-0:xiv, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.WABI.2023.0,
  author =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{0:i--0:xiv},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.0},
  URN =		{urn:nbn:de:0030-drops-186267},
  doi =		{10.4230/LIPIcs.WABI.2023.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/LIPIcs.WABI.2022.14

Efficient Reconciliation of Genomic Datasets of High Similarity

Authors: Yoshihiro Shibuya, Djamal Belazzougui, and Gregory Kucherov

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)

Abstract

We apply Invertible Bloom Lookup Tables (IBLTs) to the comparison of k-mer sets originated from large DNA sequence datasets. We show that for similar datasets, IBLTs provide a more space-efficient and, at the same time, more accurate method for estimating Jaccard similarity of underlying k-mer sets, compared to MinHash which is a go-to sketching technique for efficient pairwise similarity estimation. This is achieved by combining IBLTs with k-mer sampling based on syncmers, which constitute a context-independent alternative to minimizers and provide an unbiased estimator of Jaccard similarity. A key property of our method is that involved data structures require space proportional to the difference of k-mer sets and are independent of the size of sets themselves. As another application, we show how our ideas can be applied in order to efficiently compute (an approximation of) k-mers that differ between two datasets, still using space only proportional to their number. We experimentally illustrate our results on both simulated and real data (SARS-CoV-2 and Streptococcus Pneumoniae genomes).

Cite as

Yoshihiro Shibuya, Djamal Belazzougui, and Gregory Kucherov. Efficient Reconciliation of Genomic Datasets of High Similarity. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 14:1-14:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{shibuya_et_al:LIPIcs.WABI.2022.14,
  author =	{Shibuya, Yoshihiro and Belazzougui, Djamal and Kucherov, Gregory},
  title =	{{Efficient Reconciliation of Genomic Datasets of High Similarity}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{14:1--14:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.14},
  URN =		{urn:nbn:de:0030-drops-170481},
  doi =		{10.4230/LIPIcs.WABI.2022.14},
  annote =	{Keywords: k-mers, sketching, Invertible Bloom Lookup Tables, IBLT, MinHash, syncmers, minimizers}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.8

Space-Efficient Representation of Genomic k-Mer Count Tables

Authors: Yoshihiro Shibuya, Djamal Belazzougui, and Gregory Kucherov

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

Motivation. k-mer counting is a common task in bioinformatic pipelines, with many dedicated tools available. Output formats could rely on quotienting to reduce the space of k-mers in hash tables, however counts are not usually stored in space-efficient formats. Overall, k-mer count tables for genomic data take a considerable space, easily reaching tens of GB. Furthermore, such tables do not support efficient random-access queries in general. Results. In this work, we design an efficient representation of k-mer count tables supporting fast random-access queries. We propose to apply Compressed Static Functions (CSFs), with space proportional to the empirical zero-order entropy of the counts. For very skewed distributions, like those of k-mer counts in whole genomes, the only currently available implementation of CSFs does not provide a compact enough representation. By adding a Bloom Filter to a CSF we obtain a Bloom-enhanced CSF (BCSF) effectively overcoming this limitation. Furthermore, by combining BCSFs with minimizer-based bucketing of k-mers, we build even smaller representations breaking the empirical entropy lower bound, for large enough k. We also extend these representations to the approximate case, gaining additional space. We experimentally validate these techniques on k-mer count tables of whole genomes (E.Coli and C.Elegans) as well as on k-mer document frequency tables for 29 E.Coli genomes. In the case of exact counts, our representation takes about a half of the space of the empirical entropy, for large enough k’s.

Cite as

Yoshihiro Shibuya, Djamal Belazzougui, and Gregory Kucherov. Space-Efficient Representation of Genomic k-Mer Count Tables. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 8:1-8:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{shibuya_et_al:LIPIcs.WABI.2021.8,
  author =	{Shibuya, Yoshihiro and Belazzougui, Djamal and Kucherov, Gregory},
  title =	{{Space-Efficient Representation of Genomic k-Mer Count Tables}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{8:1--8:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.8},
  URN =		{urn:nbn:de:0030-drops-143619},
  doi =		{10.4230/LIPIcs.WABI.2021.8},
  annote =	{Keywords: k-mer counting, data structures, compression, minimizers, compressed static function, Bloom filter, empirical entropy}
}

Document

DOI: 10.4230/LIPIcs.CPM.2021.8

Weighted Ancestors in Suffix Trees Revisited

Authors: Djamal Belazzougui, Dmitry Kosolobov, Simon J. Puglisi, and Rajeev Raman

Published in: LIPIcs, Volume 191, 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)

Abstract

The weighted ancestor problem is a well-known generalization of the predecessor problem to trees. It is known to require Ω(log log n) time for queries provided 𝒪(n polylog n) space is available and weights are from [0..n], where n is the number of tree nodes. However, when applied to suffix trees, the problem, surprisingly, admits an 𝒪(n)-space solution with constant query time, as was shown by Gawrychowski, Lewenstein, and Nicholson (Proc. ESA 2014). This variant of the problem can be reformulated as follows: given the suffix tree of a string s, we need a data structure that can locate in the tree any substring s[p..q] of s in 𝒪(1) time (as if one descended from the root reading s[p..q] along the way). Unfortunately, the data structure of Gawrychowski et al. has no efficient construction algorithm, limiting its wider usage as an algorithmic tool. In this paper we resolve this issue, describing a data structure for weighted ancestors in suffix trees with constant query time and a linear construction algorithm. Our solution is based on a novel approach using so-called irreducible LCP values.

Cite as

Djamal Belazzougui, Dmitry Kosolobov, Simon J. Puglisi, and Rajeev Raman. Weighted Ancestors in Suffix Trees Revisited. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 8:1-8:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.CPM.2021.8,
  author =	{Belazzougui, Djamal and Kosolobov, Dmitry and Puglisi, Simon J. and Raman, Rajeev},
  title =	{{Weighted Ancestors in Suffix Trees Revisited}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{8:1--8:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2021.8},
  URN =		{urn:nbn:de:0030-drops-139594},
  doi =		{10.4230/LIPIcs.CPM.2021.8},
  annote =	{Keywords: suffix tree, weighted ancestors, irreducible LCP, deterministic substring hashing}
}

Document

DOI: 10.4230/LIPIcs.CPM.2020.4

Efficient Tree-Structured Categorical Retrieval

Authors: Djamal Belazzougui and Gregory Kucherov

Published in: LIPIcs, Volume 161, 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)

Abstract

We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a pre-defined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(logσ(1+o(1))+log D+O(h)) + O(Δ) bits of space and O(|p|+t) query time, where n is the total length of the documents, σ the size of the alphabet used in the documents and Δ is the total number of nodes in the category tree. Another solution uses n(logσ(1+o(1))+O(log D))+O(Δ)+O(Dlog n) bits of space and O(|p|+tlog D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.

Cite as

Djamal Belazzougui and Gregory Kucherov. Efficient Tree-Structured Categorical Retrieval. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 4:1-4:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.CPM.2020.4,
  author =	{Belazzougui, Djamal and Kucherov, Gregory},
  title =	{{Efficient Tree-Structured Categorical Retrieval}},
  booktitle =	{31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)},
  pages =	{4:1--4:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-149-8},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{161},
  editor =	{G{\o}rtz, Inge Li and Weimann, Oren},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2020.4},
  URN =		{urn:nbn:de:0030-drops-121299},
  doi =		{10.4230/LIPIcs.CPM.2020.4},
  annote =	{Keywords: pattern matching, document retrieval, category tree, space-efficient data structures}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.10

Fully-Functional Bidirectional Burrows-Wheeler Indexes and Infinite-Order De Bruijn Graphs

Authors: Djamal Belazzougui and Fabio Cunial

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

Given a string T on an alphabet of size sigma, we describe a bidirectional Burrows-Wheeler index that takes O(|T| log sigma) bits of space, and that supports the addition and removal of one character, on the left or right side of any substring of T, in constant time. Previously known data structures that used the same space allowed constant-time addition to any substring of T, but they could support removal only from specific substrings of T. We also describe an index that supports bidirectional addition and removal in O(log log |T|) time, and that takes a number of words proportional to the number of left and right extensions of the maximal repeats of T. We use such fully-functional indexes to implement bidirectional, frequency-aware, variable-order de Bruijn graphs with no upper bound on their order, and supporting natural criteria for increasing and decreasing the order during traversal.

Cite as

Djamal Belazzougui and Fabio Cunial. Fully-Functional Bidirectional Burrows-Wheeler Indexes and Infinite-Order De Bruijn Graphs. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 10:1-10:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.CPM.2019.10,
  author =	{Belazzougui, Djamal and Cunial, Fabio},
  title =	{{Fully-Functional Bidirectional Burrows-Wheeler Indexes and Infinite-Order De Bruijn Graphs}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{10:1--10:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.10},
  URN =		{urn:nbn:de:0030-drops-104811},
  doi =		{10.4230/LIPIcs.CPM.2019.10},
  annote =	{Keywords: BWT, suffix tree, CDAWG, de Bruijn graph, maximal repeat, string depth, contraction, bidirectional index}
}

Document

DOI: 10.4230/LIPIcs.CPM.2019.32

Computing the Antiperiod(s) of a String

Authors: Hayam Alamro, Golnaz Badkobeh, Djamal Belazzougui, Costas S. Iliopoulos, and Simon J. Puglisi

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)

Abstract

A string S[1,n] is a power (or repetition or tandem repeat) of order k and period n/k, if it can be decomposed into k consecutive identical blocks of length n/k. Powers and periods are fundamental structures in the study of strings and algorithms to compute them efficiently have been widely studied. Recently, Fici et al. (Proc. ICALP 2016) introduced an antipower of order k to be a string composed of k distinct blocks of the same length, n/k, called the antiperiod. An arbitrary string will have antiperiod t if it is prefix of an antipower with antiperiod t. In this paper, we describe efficient algorithm for computing the smallest antiperiod of a string S of length n in O(n) time. We also describe an algorithm to compute all the antiperiods of S that runs in O(n log n) time.

Cite as

Hayam Alamro, Golnaz Badkobeh, Djamal Belazzougui, Costas S. Iliopoulos, and Simon J. Puglisi. Computing the Antiperiod(s) of a String. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 32:1-32:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{alamro_et_al:LIPIcs.CPM.2019.32,
  author =	{Alamro, Hayam and Badkobeh, Golnaz and Belazzougui, Djamal and Iliopoulos, Costas S. and Puglisi, Simon J.},
  title =	{{Computing the Antiperiod(s) of a String}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{32:1--32:11},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.32},
  URN =		{urn:nbn:de:0030-drops-105035},
  doi =		{10.4230/LIPIcs.CPM.2019.32},
  annote =	{Keywords: antiperiod, antipower, power, period, repetition, run, string}
}

Document

DOI: 10.4230/LIPIcs.SEA.2018.17

Fast matching statistics in small space

Authors: Djamal Belazzougui, Fabio Cunial, and Olgert Denas

Published in: LIPIcs, Volume 103, 17th International Symposium on Experimental Algorithms (SEA 2018)

Abstract

Computing the matching statistics of a string S with respect to a string T on an alphabet of size sigma is a fundamental primitive for a number of large-scale string analysis applications, including the comparison of entire genomes, for which space is a pressing issue. This paper takes from theory to practice an existing algorithm that uses just O(|T|log{sigma}) bits of space, and that computes a compact encoding of the matching statistics array in O(|S|log{sigma}) time. The techniques used to speed up the algorithm are of general interest, since they optimize queries on the existence of a Weiner link from a node of the suffix tree, and parent operations after unsuccessful Weiner links. Thus, they can be applied to other matching statistics algorithms, as well as to any suffix tree traversal that relies on such calls. Some of our optimizations yield a matching statistics implementation that is up to three times faster than a plain version of the algorithm, depending on the similarity between S and T. In genomic datasets of practical significance we achieve speedups of up to 1.8, but our fastest implementations take on average twice the time of an existing code based on the LCP array. The key advantage is that our implementations need between one half and one fifth of the competitor's memory, and they approach comparable running times when S and T are very similar.

Cite as

Djamal Belazzougui, Fabio Cunial, and Olgert Denas. Fast matching statistics in small space. In 17th International Symposium on Experimental Algorithms (SEA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 103, pp. 17:1-17:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.SEA.2018.17,
  author =	{Belazzougui, Djamal and Cunial, Fabio and Denas, Olgert},
  title =	{{Fast matching statistics in small space}},
  booktitle =	{17th International Symposium on Experimental Algorithms (SEA 2018)},
  pages =	{17:1--17:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-070-5},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{103},
  editor =	{D'Angelo, Gianlorenzo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2018.17},
  URN =		{urn:nbn:de:0030-drops-89528},
  doi =		{10.4230/LIPIcs.SEA.2018.17},
  annote =	{Keywords: Matching statistics, maximal repeat, Burrows-Wheeler transform, wavelet tree, suffix tree topology}
}

Document

DOI: 10.4230/LIPIcs.CPM.2017.7

Representing the Suffix Tree with the CDAWG

Authors: Djamal Belazzougui and Fabio Cunial

Published in: LIPIcs, Volume 78, 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)

Abstract

Given a string T, it is known that its suffix tree can be represented using the compact directed acyclic word graph (CDAWG) with e_T arcs, taking overall O(e_T+e_REV(T)) words of space, where REV(T) is the reverse of T, and supporting some key operations in time between O(1) and O(log(log(n))) in the worst case. This representation is especially appealing for highly repetitive strings, like collections of similar genomes or of version-controlled documents, in which e_T grows sublinearly in the length of T in practice. In this paper we augment such representation, supporting a number of additional queries in worst-case time between O(1) and O(log(n)) in the RAM model, without increasing space complexity asymptotically. Our technique, based on a heavy path decomposition of the suffix tree, enables also a representation of the suffix array, of the inverse suffix array, and of T itself, that takes O(e_T) words of space, and that supports random access in O(log(n)) time. Furthermore, we establish a connection between the reversed CDAWG of T and a context-free grammar that produces T and only T, which might have independent interest.

Cite as

Djamal Belazzougui and Fabio Cunial. Representing the Suffix Tree with the CDAWG. In 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 78, pp. 7:1-7:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{belazzougui_et_al:LIPIcs.CPM.2017.7,
  author =	{Belazzougui, Djamal and Cunial, Fabio},
  title =	{{Representing the Suffix Tree with the CDAWG}},
  booktitle =	{28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)},
  pages =	{7:1--7:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-039-2},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{78},
  editor =	{K\"{a}rkk\"{a}inen, Juha and Radoszewski, Jakub and Rytter, Wojciech},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2017.7},
  URN =		{urn:nbn:de:0030-drops-73402},
  doi =		{10.4230/LIPIcs.CPM.2017.7},
  annote =	{Keywords: CDAWG, suffix tree, heavy path decomposition, maximal repeat, context-free grammar}
}

Search Results

Documents authored by Belazzougui, Djamal

Better Space-Time-Robustness Trade-Offs for Set Reconciliation

Abstract

Cite as

LIPIcs, Volume 273, WABI 2023, Complete Volume

Abstract

Cite as

Front Matter, Table of Contents, Preface, Conference Organization

Abstract

Cite as

Efficient Reconciliation of Genomic Datasets of High Similarity

Abstract

Cite as

Space-Efficient Representation of Genomic k-Mer Count Tables

Abstract

Cite as

Weighted Ancestors in Suffix Trees Revisited

Abstract

Cite as

Efficient Tree-Structured Categorical Retrieval

Abstract

Cite as

Fully-Functional Bidirectional Burrows-Wheeler Indexes and Infinite-Order De Bruijn Graphs

Abstract

Cite as

Computing the Antiperiod(s) of a String

Abstract

Cite as

Fast matching statistics in small space

Abstract

Cite as

Representing the Suffix Tree with the CDAWG

Abstract

Cite as