8 Search Results for "Lipták, Zsuzsanna"


Volume

LIPIcs, Volume 259

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)

CPM 2023, June 26-28, 2023, Marne-la-Vallée, France

Editors: Laurent Bulteau and Zsuzsanna Lipták

Document
Matching Statistics Speed up BWT Construction

Authors: Francesco Masillo

Published in: LIPIcs, Volume 274, 31st Annual European Symposium on Algorithms (ESA 2023)


Abstract
Due to the exponential growth of genomic data, constructing dedicated data structures has become the principal bottleneck in common bioinformatics applications. In particular, the Burrows-Wheeler Transform (BWT) is the basis of some of the most popular self-indexes for genomic data, due to its known favourable behaviour on repetitive data. Some tools that exploit the intrinsic repetitiveness of biological data have risen in popularity, due to their speed and low space consumption. We introduce a new algorithm for computing the BWT, which takes advantage of the redundancy of the data through a compressed version of matching statistics, the CMS of [Lipták et al., WABI 2022]. We show that it suffices to sort a small subset of suffixes, lowering both computation time and space. Our result is due to a new insight which links the so-called insert-heads of [Lipták et al., WABI 2022] to the well-known run boundaries of the BWT. We give two implementations of our algorithm, called CMS-BWT, both competitive in our experimental validation on highly repetitive real-life datasets. In most cases, they outperform other tools w.r.t. running time, trading off a higher memory footprint, which, however, is still considerably smaller than the total size of the input data.

Cite as

Francesco Masillo. Matching Statistics Speed up BWT Construction. In 31st Annual European Symposium on Algorithms (ESA 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 274, pp. 83:1-83:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{masillo:LIPIcs.ESA.2023.83,
  author =	{Masillo, Francesco},
  title =	{{Matching Statistics Speed up BWT Construction}},
  booktitle =	{31st Annual European Symposium on Algorithms (ESA 2023)},
  pages =	{83:1--83:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-295-2},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{274},
  editor =	{G{\o}rtz, Inge Li and Farach-Colton, Martin and Puglisi, Simon J. and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2023.83},
  URN =		{urn:nbn:de:0030-drops-187360},
  doi =		{10.4230/LIPIcs.ESA.2023.83},
  annote =	{Keywords: Burrows-Wheeler Transform, matching statistics, string collections, compressed representation, data structures, efficient algorithms}
}
Document
Complete Volume
LIPIcs, Volume 259, CPM 2023, Complete Volume

Authors: Laurent Bulteau and Zsuzsanna Lipták

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)


Abstract
LIPIcs, Volume 259, CPM 2023, Complete Volume

Cite as

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 1-472, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@Proceedings{bulteau_et_al:LIPIcs.CPM.2023,
  title =	{{LIPIcs, Volume 259, CPM 2023, Complete Volume}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{1--472},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023},
  URN =		{urn:nbn:de:0030-drops-179536},
  doi =		{10.4230/LIPIcs.CPM.2023},
  annote =	{Keywords: LIPIcs, Volume 259, CPM 2023, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Laurent Bulteau and Zsuzsanna Lipták

Published in: LIPIcs, Volume 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 259, pp. 0:i-0:xvi, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{bulteau_et_al:LIPIcs.CPM.2023.0,
  author =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},
  pages =	{0:i--0:xvi},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-276-1},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{259},
  editor =	{Bulteau, Laurent and Lipt\'{a}k, Zsuzsanna},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2023.0},
  URN =		{urn:nbn:de:0030-drops-179542},
  doi =		{10.4230/LIPIcs.CPM.2023.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Suffix Sorting via Matching Statistics

Authors: Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.

Cite as

Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi. Suffix Sorting via Matching Statistics. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 20:1-20:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{liptak_et_al:LIPIcs.WABI.2022.20,
  author =	{Lipt\'{a}k, Zsuzsanna and Masillo, Francesco and Puglisi, Simon J.},
  title =	{{Suffix Sorting via Matching Statistics}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{20:1--20:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.20},
  URN =		{urn:nbn:de:0030-drops-170545},
  doi =		{10.4230/LIPIcs.WABI.2022.20},
  annote =	{Keywords: Generalized suffix array, matching statistics, string collections, compressed representation, data structures, efficient algorithms}
}
Document
A Theoretical and Experimental Analysis of BWT Variants for String Collections

Authors: Davide Cenzato and Zsuzsanna Lipták

Published in: LIPIcs, Volume 223, 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)


Abstract
The extended Burrows-Wheeler-Transform (eBWT), introduced by Mantaci et al. [Theor. Comput. Sci., 2007], is a generalization of the Burrows-Wheeler-Transform (BWT) to multisets of strings. While the original BWT is based on the lexicographic order, the eBWT uses the omega-order, which differs from the lexicographic order in important ways. A number of tools are available that compute the BWT of string collections; however, the data structures they generate in most cases differ from the one originally defined, as well as from each other. In this paper, we review the differences between these BWT variants, both from a theoretical and from a practical point of view, comparing them on several real-life datasets with different characteristics. We find that the differences can be extensive, depending on the dataset characteristics, and are largest on collections of many highly similar short sequences. The widely-used parameter r, the number of runs of the BWT, also shows notable variation between the different BWT variants; on our datasets, it varied by a multiplicative factor of up to 4.2.

Cite as

Davide Cenzato and Zsuzsanna Lipták. A Theoretical and Experimental Analysis of BWT Variants for String Collections. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 223, pp. 25:1-25:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{cenzato_et_al:LIPIcs.CPM.2022.25,
  author =	{Cenzato, Davide and Lipt\'{a}k, Zsuzsanna},
  title =	{{A Theoretical and Experimental Analysis of BWT Variants for String Collections}},
  booktitle =	{33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022)},
  pages =	{25:1--25:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-234-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{223},
  editor =	{Bannai, Hideo and Holub, Jan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2022.25},
  URN =		{urn:nbn:de:0030-drops-161529},
  doi =		{10.4230/LIPIcs.CPM.2022.25},
  annote =	{Keywords: Burrows-Wheeler-Transform, extended BWT, string collections, repetitiveness measures, r, compression}
}
Document
Pattern Discovery in Colored Strings

Authors: Zsuzsanna Lipták, Simon J. Puglisi, and Massimiliano Rossi

Published in: LIPIcs, Volume 160, 18th International Symposium on Experimental Algorithms (SEA 2020)


Abstract
We consider the problem of identifying patterns of interest in colored strings. A colored string is a string in which each position is colored with one of a finite set of colors. Our task is to find substrings that always occur followed by the same color at the same distance. The problem is motivated by applications in embedded systems verification, in particular, assertion mining. The goal there is to automatically infer properties of the embedded system from the analysis of its simulation traces. We show that the number of interesting patterns is upper-bounded by 𝒪(n²) where n is the length of the string. We introduce a baseline algorithm with 𝒪(n²) running time which identifies all interesting patterns for all colors in the string satisfying certain minimality conditions. When one is interested in patterns related to only one color, we provide an algorithm that identifies patterns in 𝒪(n²log n) time, but is faster than the first algorithm in practice, both on simulated and on real-world patterns.

Cite as

Zsuzsanna Lipták, Simon J. Puglisi, and Massimiliano Rossi. Pattern Discovery in Colored Strings. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 12:1-12:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{liptak_et_al:LIPIcs.SEA.2020.12,
  author =	{Lipt\'{a}k, Zsuzsanna and Puglisi, Simon J. and Rossi, Massimiliano},
  title =	{{Pattern Discovery in Colored Strings}},
  booktitle =	{18th International Symposium on Experimental Algorithms (SEA 2020)},
  pages =	{12:1--12:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-148-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{160},
  editor =	{Faro, Simone and Cantone, Domenico},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2020.12},
  URN =		{urn:nbn:de:0030-drops-120862},
  doi =		{10.4230/LIPIcs.SEA.2020.12},
  annote =	{Keywords: property testing, suffix tree, pattern mining}
}
Document
Reconstruction of Trees from Jumbled and Weighted Subtrees

Authors: Dénes Bartha, Peter Burcsi, and Zsuzsanna Lipták

Published in: LIPIcs, Volume 54, 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)


Abstract
Let T be an edge-labeled graph, where the labels are from a finite alphabet Sigma. For a subtree U of T the Parikh vector of U is a vector of length |Sigma| which specifies the multiplicity of each label in U. We ask when T can be reconstructed from the multiset of Parikh vectors of all its subtrees, or all of its paths, or all of its maximal paths. We consider the analogous problems for weighted trees. We show how several well-known reconstruction problems on labeled strings, weighted strings and point sets on a line can be included in this framework. We present reconstruction algorithms and non-reconstructibility results, and extend the polynomial method, previously applied to jumbled strings [Acharya et al., SIAM J. on Discr. Math, 2015] and weighted strings [Bansal et al., CPM 2004], to deal with general trees and special tree classes.

Cite as

Dénes Bartha, Peter Burcsi, and Zsuzsanna Lipták. Reconstruction of Trees from Jumbled and Weighted Subtrees. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 10:1-10:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{bartha_et_al:LIPIcs.CPM.2016.10,
  author =	{Bartha, D\'{e}nes and Burcsi, Peter and Lipt\'{a}k, Zsuzsanna},
  title =	{{Reconstruction of Trees from Jumbled and Weighted Subtrees}},
  booktitle =	{27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)},
  pages =	{10:1--10:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-012-5},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{54},
  editor =	{Grossi, Roberto and Lewenstein, Moshe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2016.10},
  URN =		{urn:nbn:de:0030-drops-60861},
  doi =		{10.4230/LIPIcs.CPM.2016.10},
  annote =	{Keywords: trees, paths, Parikh vectors, reconstruction problems, homometric sets, polynomial method, jumbled strings, weighted strings}
}
  • Refine by Author
  • 6 Lipták, Zsuzsanna
  • 2 Bulteau, Laurent
  • 2 Masillo, Francesco
  • 2 Puglisi, Simon J.
  • 1 Bartha, Dénes
  • Show More...

  • Refine by Classification
  • 5 Theory of computation → Design and analysis of algorithms
  • 3 Theory of computation → Data compression
  • 2 Applied computing → Computational biology
  • 2 Mathematics of computing → Combinatoric problems
  • 2 Mathematics of computing → Combinatorics on words
  • Show More...

  • Refine by Keyword
  • 3 string collections
  • 2 compressed representation
  • 2 data structures
  • 2 efficient algorithms
  • 2 matching statistics
  • Show More...

  • Refine by Type
  • 7 document
  • 1 volume

  • Refine by Publication Year
  • 4 2023
  • 2 2022
  • 1 2016
  • 1 2020

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail