DROPS

Document

DOI: 10.4230/LIPIcs.ESA.2025.47

Fast and Lightweight Distributed Suffix Array Construction

Authors: Manuel Haag, Florian Kurpicz, Peter Sanders, and Matthias Schimek

Published in: LIPIcs, Volume 351, 33rd Annual European Symposium on Algorithms (ESA 2025)

Abstract

The suffix array contains the lexicographical order of all suffixes of a text. It is one of the most well-studied text indices with applications in bioinformatics, compression, and pattern matching. The main bottleneck of distributed-memory suffix array construction algorithms is their memory requirements. Even careful implementations require 30×-60× the input size as working memory. We present a scalable and lightweight distributed-memory adaptation of the difference cover (DCX) suffix array construction algorithm. Our approach relies on novel bucketing and random chunk redistribution techniques which reduce our memory requirement to 20×-26× the input size for medium-sized inputs and to 14×-15× for large-sized inputs. Regarding running time, we achieve speedups of up to 5× over current state-of-the-art distributed suffix array construction algorithms.

Cite as

Manuel Haag, Florian Kurpicz, Peter Sanders, and Matthias Schimek. Fast and Lightweight Distributed Suffix Array Construction. In 33rd Annual European Symposium on Algorithms (ESA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 351, pp. 47:1-47:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{haag_et_al:LIPIcs.ESA.2025.47,
  author =	{Haag, Manuel and Kurpicz, Florian and Sanders, Peter and Schimek, Matthias},
  title =	{{Fast and Lightweight Distributed Suffix Array Construction}},
  booktitle =	{33rd Annual European Symposium on Algorithms (ESA 2025)},
  pages =	{47:1--47:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-395-9},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{351},
  editor =	{Benoit, Anne and Kaplan, Haim and Wild, Sebastian and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2025.47},
  URN =		{urn:nbn:de:0030-drops-245154},
  doi =		{10.4230/LIPIcs.ESA.2025.47},
  annote =	{Keywords: Distributed Computing, Suffix Array Construction}
}

Document

DOI: 10.4230/LIPIcs.WABI.2025.22

Design of Worst-Case-Optimal Spaced Seeds

Authors: Jens Zentgraf and Sven Rahmann

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)

Abstract

Read mapping (and alignment) is a fundamental problem in biological sequence analysis. For speed and computational efficiency, many popular read mappers tolerate only a few differences between the read and the corresponding part of the reference genome, which leads to reference bias: Reads with too many differences are not guaranteed to be mapped correctly or at all, because to even consider a genomic position, a sufficiently long exact match (seed) must exist. While pangenomes and their graph-based representations provide one way to avoid reference bias by enlarging the reference, we explore an orthogonal approach and consider stronger substitution-tolerant primitives, namely spaced seeds or gapped k-mers. Given two integers k ≤ w, one considers k selected positions, described by a mask, from each length-w window in a sequence. In the existing literature, masks with certain probabilistic guarantees have been designed for small values of k. Here, for the first time, we take a combinatorial approach from a worst-case perspective. For any mask, using integer linear programs, we find least favorable distributions of sequence changes in two different senses: (1) minimizing the number of unchanged windows; (2) minimizing the number of positions covered by unchanged windows. Then, among all masks or all symmetric masks of a given shape (k,w), we find the set of best masks that maximize these minima. As a result, we obtain robust masks, even for large numbers of changes. We illustrate the properties of these masks by constructing a challenging set of reads that contain many approximately equidistributed substitutions (but no indels) that many existing tools cannot map, even though they are in principle easily mappable (apart from the large number of changes) because they originate from selected non-repetitive regions of the human reference genome. We observe that the majority of these reads can be mapped with a simple alignment-free approach using chosen spaced masks, where seeding approaches based on contiguous k-mers fail.

Cite as

Jens Zentgraf and Sven Rahmann. Design of Worst-Case-Optimal Spaced Seeds. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 22:1-22:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{zentgraf_et_al:LIPIcs.WABI.2025.22,
  author =	{Zentgraf, Jens and Rahmann, Sven},
  title =	{{Design of Worst-Case-Optimal Spaced Seeds}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{22:1--22:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.22},
  URN =		{urn:nbn:de:0030-drops-239488},
  doi =		{10.4230/LIPIcs.WABI.2025.22},
  annote =	{Keywords: Spaced seed, Gapped k-mer, Integer linear program (ILP), Worst-case design, Reference bias}
}

Document

DOI: 10.4230/LIPIcs.WABI.2025.11

Mutational Signature Refitting on Sparse Pan-Cancer Data

Authors: Gal Gilad, Teresa M. Przytycka, and Roded Sharan

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)

Abstract

Mutational processes shape cancer genomes, leaving characteristic marks that are termed signatures. The level of activity of each such process, or its signature exposure, provides important information on the disease, improving patient stratification and the prediction of drug response. Thus, there is growing interest in developing refitting methods that decipher those exposures. Previous work in this domain was unsupervised in nature, employing algebraic decomposition and probabilistic inference methods. Here we provide a supervised approach to the problem of signature refitting and show its superiority over current methods. Our method, SuRe, leverages a neural network model to capture correlations between signature exposures in real data. We show that SuRe outperforms previous methods on sparse mutation data from tumor type specific data sets, as well as pan-cancer data sets, with an increasing advantage as the data become sparser. We further demonstrate its utility in clinical settings.

Cite as

Gal Gilad, Teresa M. Przytycka, and Roded Sharan. Mutational Signature Refitting on Sparse Pan-Cancer Data. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 11:1-11:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{gilad_et_al:LIPIcs.WABI.2025.11,
  author =	{Gilad, Gal and Przytycka, Teresa M. and Sharan, Roded},
  title =	{{Mutational Signature Refitting on Sparse Pan-Cancer Data}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{11:1--11:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.11},
  URN =		{urn:nbn:de:0030-drops-239374},
  doi =		{10.4230/LIPIcs.WABI.2025.11},
  annote =	{Keywords: mutational signatures, signature refitting, cancer genomics, genomic data analysis, somatic mutations}
}

Document

Research

DOI: 10.4230/OASIcs.Grossi.17

On Inverting the Burrows-Wheeler Transform

Authors: Nicola Cotumaccio

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

We study the relationship between four fundamental problems: sorting, suffix sorting, element distinctness and BWT inversion. Our main contribution is an Ω(n log n) lower bound for BWT inversion in the comparison model. As a corollary, we obtain a new proof of the classical Ω(n log n) lower bound for sorting, which we believe to be of didactic interest for those who are not familiar with the Burrows-Wheeler transform.

Cite as

Nicola Cotumaccio. On Inverting the Burrows-Wheeler Transform. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 17:1-17:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{cotumaccio:OASIcs.Grossi.17,
  author =	{Cotumaccio, Nicola},
  title =	{{On Inverting the Burrows-Wheeler Transform}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{17:1--17:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.17},
  URN =		{urn:nbn:de:0030-drops-238169},
  doi =		{10.4230/OASIcs.Grossi.17},
  annote =	{Keywords: Burrows-Wheeler transform, sorting, suffix array, element distinctness}
}

Document

DOI: 10.4230/OASIcs.Manzini.2

A Survey of the Bijective Burrows-Wheeler Transform

Authors: Hideo Bannai, Dominik Köppl, and Zsuzsanna Lipták

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The Bijective BWT (BBWT), conceived by Scott in 2007, later summarized in a preprint by Gil and Scott in 2009 (arXiv 2012), is a variant of the Burrows-Wheeler Transform which is bijective: every string is the BBWT of some string. Indeed, the BBWT of a string is the extended BWT [Mantaci et al., 2007] of the factors of its Lyndon factorization. The BBWT has been receiving increasing interest in recent years. In this paper, we survey existing research on the BBWT, starting with its history and motivation. We then present algorithmic topics including construction algorithms with various complexities and an index on top of the BBWT for pattern matching. We subsequently address some properties of the BBWT as a compressor, discussing robustness to operations such as reversal, edits, rotation, as well as compression power. We close with listing other bijective variants of the BWT and open problems concerning the BBWT.

Cite as

Hideo Bannai, Dominik Köppl, and Zsuzsanna Lipták. A Survey of the Bijective Burrows-Wheeler Transform. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 2:1-2:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{bannai_et_al:OASIcs.Manzini.2,
  author =	{Bannai, Hideo and K\"{o}ppl, Dominik and Lipt\'{a}k, Zsuzsanna},
  title =	{{A Survey of the Bijective Burrows-Wheeler Transform}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{2:1--2:26},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.2},
  URN =		{urn:nbn:de:0030-drops-239100},
  doi =		{10.4230/OASIcs.Manzini.2},
  annote =	{Keywords: Burrows-Wheeler Transform, compression, text indexing, repetitiveness measure, Lyndon words, index construction algorithms, bijective string transformation}
}

Document

DOI: 10.4230/OASIcs.Manzini.6

Optimizing the Performance of the FM-Index for Large-Scale Data

Authors: Eddie Ferro and Christina Boucher

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The FM-index is a fundamental data structure used in bioinformatics to efficiently search for strings and index genomes. However, the FM-index can pose computational challenges, particularly in the context of large-scale genomic datasets, due to the complexity of its underlying components and data encodings. In this paper, we present a comprehensive review of efficient variants of the FM-index and the encoding strategies used to improve performance. We examine hardware-accelerated techniques, such as memory-efficient data layouts and cache-aware structures, as well as software-level innovations, including algorithmic refinements and compact representations. The reviewed work demonstrates substantial gains in both speed and scalability, making methods that use the FM-index more practical for high-throughput genomic applications. By analyzing the trade-offs and design choices of these variants, we highlight how combining hardware-aware and software-centric strategies enables more efficient FM-index construction and usage across a range of bioinformatics tasks.

Cite as

Eddie Ferro and Christina Boucher. Optimizing the Performance of the FM-Index for Large-Scale Data. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 6:1-6:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ferro_et_al:OASIcs.Manzini.6,
  author =	{Ferro, Eddie and Boucher, Christina},
  title =	{{Optimizing the Performance of the FM-Index for Large-Scale Data}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{6:1--6:21},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.6},
  URN =		{urn:nbn:de:0030-drops-239140},
  doi =		{10.4230/OASIcs.Manzini.6},
  annote =	{Keywords: FM-Index Acceleration, Run-Length Encoding, Suffix Array Optimization, Burrows-Wheeler Transform, Efficient Backward Search}
}

Document

DOI: 10.4230/OASIcs.Manzini.7

Algorithms for Computing Very Large BWTs: a Short Survey

Authors: Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The Burrows-Wheeler Transform (BWT) is a fundamental string transformation that, although initially introduced for data compression, has been extensively utilized across various domains, including text indexing and pattern matching within large datasets. Although the BWT construction is linear, the constants make the task impractical for large datasets, and as highlighted by Ferragina et al. [Paolo Ferragina et al., 2012], "to use it, one must first build it!". Thus, the construction of the BWT remains a significant challenge. For these reasons, during the past three decades there has been a succession of new algorithms for its construction using techniques that work in external memory or that use text compression. In this survey, we revise some of the most important advancements and tools presented in the past years for computing large BWTs exploiting external memory or text compression approaches without using additional information about the data.

Cite as

Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone. Algorithms for Computing Very Large BWTs: a Short Survey. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 7:1-7:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{diazdominguez_et_al:OASIcs.Manzini.7,
  author =	{D{\'\i}az-Dom{\'\i}nguez, Diego and Egidi, Lavinia and Guerrini, Veronica and Louza, Felipe A. and Rosone, Giovanna},
  title =	{{Algorithms for Computing Very Large BWTs: a Short Survey}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{7:1--7:28},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.7},
  URN =		{urn:nbn:de:0030-drops-239151},
  doi =		{10.4230/OASIcs.Manzini.7},
  annote =	{Keywords: Burrows-Wheeler transform, Extended Burrows-Wheeler transform, external memory, text compression, longest common prefix}
}

@InProceedings{diazdominguez_et_al:OASIcs.Manzini.7,
  author =	{D{\'\i}az-Dom{\'\i}nguez, Diego and Egidi, Lavinia and Guerrini, Veronica and Louza, Felipe A. and Rosone, Giovanna},
  title =	{{Algorithms for Computing Very Large BWTs: a Short Survey}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{7:1--7:28},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.7},
  URN =		{urn:nbn:de:0030-drops-239151},
  doi =		{10.4230/OASIcs.Manzini.7},
  annote =	{Keywords: Burrows-Wheeler transform, Extended Burrows-Wheeler transform, external memory, text compression, longest common prefix}
}

Document

DOI: 10.4230/OASIcs.Manzini.9

Search Schemes for Approximate Pattern Matching: An Overview

Authors: Lore Depuydt, Jan Fostier, Simon Gottlieb, Gregory Kucherov, Knut Reinert, and Luca Renders

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

We provide a brief survey of results on solving the approximate pattern matching problem using search schemes, as introduced by Kucherov et al. (2016). We demonstrate that search schemes constitute a flexible and versatile tool that enable the specification of various search strategies, including several known filtering methods. We present approaches for designing efficient search schemes and for implementing them effectively. Finally, we conclude with experimental results comparing multiple search schemes on DNA sequencing data using the Columba software by Renders et al. (2021).

Cite as

Lore Depuydt, Jan Fostier, Simon Gottlieb, Gregory Kucherov, Knut Reinert, and Luca Renders. Search Schemes for Approximate Pattern Matching: An Overview. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 9:1-9:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{depuydt_et_al:OASIcs.Manzini.9,
  author =	{Depuydt, Lore and Fostier, Jan and Gottlieb, Simon and Kucherov, Gregory and Reinert, Knut and Renders, Luca},
  title =	{{Search Schemes for Approximate Pattern Matching: An Overview}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{9:1--9:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.9},
  URN =		{urn:nbn:de:0030-drops-239172},
  doi =		{10.4230/OASIcs.Manzini.9},
  annote =	{Keywords: FM-index, bidirectional index, approximate pattern matching, search scheme}
}

Document

DOI: 10.4230/OASIcs.Manzini.8

A Taxonomy of LCP-Array Construction Algorithms

Authors: Johannes Fischer and Enno Ohlebusch

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)

Abstract

The combination of the suffix array and the LCP-array can be used to solve many string processing problems efficiently. We review some of the most important sequential LCP-array construction algorithms in random access memory.

Cite as

Johannes Fischer and Enno Ohlebusch. A Taxonomy of LCP-Array Construction Algorithms. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 8:1-8:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{fischer_et_al:OASIcs.Manzini.8,
  author =	{Fischer, Johannes and Ohlebusch, Enno},
  title =	{{A Taxonomy of LCP-Array Construction Algorithms}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{8:1--8:17},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.8},
  URN =		{urn:nbn:de:0030-drops-239166},
  doi =		{10.4230/OASIcs.Manzini.8},
  annote =	{Keywords: longest common prefix array, suffix array, Burrows-Wheeler transform}
}

Document

DOI: 10.4230/LIPIcs.SEA.2025.4

U-Index: A Universal Indexing Framework for Matching Long Patterns

Authors: Lorraine A. K. Ayad, Gabriele Fici, Ragnar Groot Koerkamp, Grigorios Loukides, Rob Patro, Giulio Ermanno Pibiri, and Solon P. Pissis

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)

Abstract

Motivation. Text indexing is a fundamental and well-studied problem. Classic solutions to this problem either replace the original text with a compressed representation, e.g., the FM-index and its variants, or keep it uncompressed but attach some redundancy - an index - to accelerate matching, e.g., the suffix array. The former solutions thus retain excellent compressed space, but are practically slow to construct and query. The latter approaches, instead, sacrifice space efficiency but are typically faster; for example, the suffix array takes much more space than the text itself for commonly used alphabets, like ASCII or DNA, but it is very fast to construct and query. Methods. In this paper, we show that efficient text indexing can be achieved using just a small extra space on top of the original text, provided that the query patterns are sufficiently long. More specifically, we develop a new indexing paradigm in which a sketch of a query pattern is first matched against a sketch of the text. Once candidate matches are retrieved, they are verified using the original text. This paradigm is thus universal in the sense that it allows us to use any solution to index the sketched text, like a suffix array, FM-index, or r-index. Results. We explore both the theory and the practice of this universal framework. With an extensive experimental analysis, we show that, surprisingly, universal indexes can be constructed much faster than their unsketched counterparts and take a fraction of the space, as a direct consequence of (i) having a lower bound on the length of patterns and (ii) working in sketch space. Furthermore, these data structures have the potential of retaining or even improving query time, because matching against the sketched text is faster and verifying candidates can be theoretically done in constant time per occurrence (or, in practice, by short and cache-friendly scans of the text). Finally, we discuss some important applications of this novel indexing paradigm to computational biology. We hypothesize that such indexes will be particularly effective when the queries are sufficiently long, and so we demonstrate applications in long-read mapping.

Cite as

Lorraine A. K. Ayad, Gabriele Fici, Ragnar Groot Koerkamp, Grigorios Loukides, Rob Patro, Giulio Ermanno Pibiri, and Solon P. Pissis. U-Index: A Universal Indexing Framework for Matching Long Patterns. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 4:1-4:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{ayad_et_al:LIPIcs.SEA.2025.4,
  author =	{Ayad, Lorraine A. K. and Fici, Gabriele and Groot Koerkamp, Ragnar and Loukides, Grigorios and Patro, Rob and Pibiri, Giulio Ermanno and Pissis, Solon P.},
  title =	{{U-Index: A Universal Indexing Framework for Matching Long Patterns}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{4:1--4:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.4},
  URN =		{urn:nbn:de:0030-drops-232420},
  doi =		{10.4230/LIPIcs.SEA.2025.4},
  annote =	{Keywords: Text Indexing, Sketching, Minimizers, Hashing}
}

Document

DOI: 10.4230/LIPIcs.CPM.2025.6

String Problems in the Congested Clique Model

Authors: Shay Golan and Matan Kraus

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)

Abstract

In this paper we present algorithms for several string problems in the Congested Clique model. In the Congested Clique model, n nodes (computers) are used to solve some problem. The input to the problem is distributed among the nodes, and the communication between the nodes is conducted in rounds. In each round, every node is allowed to send an O(log n)-bit message to every other node in the network. We consider three fundamental string problems in the Congested Clique model. First, we present an O(1) rounds algorithm for string sorting that supports strings of arbitrary length. Second, we present an O(1) rounds combinatorial pattern matching algorithm. Finally, we present an O(log log n) rounds algorithm for the computation of the suffix array and the corresponding Longest Common Prefix array of a given string.

Cite as

Shay Golan and Matan Kraus. String Problems in the Congested Clique Model. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 6:1-6:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{golan_et_al:LIPIcs.CPM.2025.6,
  author =	{Golan, Shay and Kraus, Matan},
  title =	{{String Problems in the Congested Clique Model}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{6:1--6:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.6},
  URN =		{urn:nbn:de:0030-drops-231003},
  doi =		{10.4230/LIPIcs.CPM.2025.6},
  annote =	{Keywords: String Sorting, Pattern Matching, Suffix Array, Congested Clique, Sorting}
}

Document

Survey

DOI: 10.4230/TGDK.2.1.3

Semantic Web: Past, Present, and Future

Authors: Ansgar Scherp, Gerd Groener, Petr Škoda, Katja Hose, and Maria-Esther Vidal

Published in: TGDK, Volume 2, Issue 1 (2024): Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge, Volume 2, Issue 1

Abstract

Ever since the vision was formulated, the Semantic Web has inspired many generations of innovations. Semantic technologies have been used to share vast amounts of information on the Web, enhance them with semantics to give them meaning, and enable inference and reasoning on them. Throughout the years, semantic technologies, and in particular knowledge graphs, have been used in search engines, data integration, enterprise settings, and machine learning. In this paper, we recap the classical concepts and foundations of the Semantic Web as well as modern and recent concepts and applications, building upon these foundations. The classical topics we cover include knowledge representation, creating and validating knowledge on the Web, reasoning and linking, and distributed querying. We enhance this classical view of the so-called "Semantic Web Layer Cake" with an update of recent concepts that include provenance, security and trust, as well as a discussion of practical impacts from industry-led contributions. We conclude with an outlook on the future directions of the Semantic Web. This is a living document. If you like to contribute, please contact the first author and visit: https://github.com/ascherp/semantic-web-primer

Cite as

Ansgar Scherp, Gerd Groener, Petr Škoda, Katja Hose, and Maria-Esther Vidal. Semantic Web: Past, Present, and Future. In Special Issue on Trends in Graph Data and Knowledge - Part 2. Transactions on Graph Data and Knowledge (TGDK), Volume 2, Issue 1, pp. 3:1-3:37, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@Article{scherp_et_al:TGDK.2.1.3,
  author =	{Scherp, Ansgar and Groener, Gerd and \v{S}koda, Petr and Hose, Katja and Vidal, Maria-Esther},
  title =	{{Semantic Web: Past, Present, and Future}},
  journal =	{Transactions on Graph Data and Knowledge},
  pages =	{3:1--3:37},
  ISSN =	{2942-7517},
  year =	{2024},
  volume =	{2},
  number =	{1},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/TGDK.2.1.3},
  URN =		{urn:nbn:de:0030-drops-198607},
  doi =		{10.4230/TGDK.2.1.3},
  annote =	{Keywords: Linked Open Data, Semantic Web Graphs, Knowledge Graphs}
}

Document

DOI: 10.4230/DagSemProc.10021.5

Programming Service Oriented Agents

Authors: Benjamin Hirsch, Thomas Konnerth, Michael Burkhardt, and Sahin Albayrak

Published in: Dagstuhl Seminar Proceedings, Volume 10021, Service-Oriented Architecture and (Multi-)Agent Systems Technology (2010)

Abstract

This paper introduces a programming language for service-oriented agents. JADL++ combines the ease of use of scripting-languages with a state-of-the-art service oriented approach which allows the seamless integration of web-services. Furthermore, the language includes OWL-based ontologies for semantic descriptions of data and services, thus allowing agents to make intelligent decisions about service calls.

Cite as

Benjamin Hirsch, Thomas Konnerth, Michael Burkhardt, and Sahin Albayrak. Programming Service Oriented Agents. In Service-Oriented Architecture and (Multi-)Agent Systems Technology. Dagstuhl Seminar Proceedings, Volume 10021, pp. 1-16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{hirsch_et_al:DagSemProc.10021.5,
  author =	{Hirsch, Benjamin and Konnerth, Thomas and Burkhardt, Michael and Albayrak, Sahin},
  title =	{{Programming Service Oriented Agents}},
  booktitle =	{Service-Oriented Architecture and (Multi-)Agent Systems Technology},
  pages =	{1--16},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10021},
  editor =	{Monique Calisti and Frank P. Dignum and Ryszard Kowalczyk and Frank Leymann and Rainer Unland},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagSemProc.10021.5},
  URN =		{urn:nbn:de:0030-drops-28158},
  doi =		{10.4230/DagSemProc.10021.5},
  annote =	{Keywords: Service oriented architectures, agents, agent programming language, semantic services}
}

13 Search Results for "Burkhardt, Michael"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message