13 Search Results for "Paten, Benedict J."


Document
Show Me Your Best Side: Characteristics of User-Preferred Perspectives for 3D Graph Drawings

Authors: Lucas Joos, Gavin J. Mooney, Maximilian T. Fischer, Daniel A. Keim, Falk Schreiber, Helen C. Purchase, and Karsten Klein

Published in: LIPIcs, Volume 357, 33rd International Symposium on Graph Drawing and Network Visualization (GD 2025)


Abstract
The visual analysis of graphs in 3D has become increasingly popular, accelerated by the rise of immersive technology, such as augmented and virtual reality. Unlike 2D drawings, 3D graph layouts are highly viewpoint-dependent, making perspective selection critical for revealing structural and relational patterns. Despite its importance, there is limited empirical evidence guiding what constitutes an effective or preferred viewpoint from the user’s perspective. In this paper, we present a systematic investigation into user-preferred viewpoints in 3D graph visualisations. We conducted a controlled study with 23 participants in a virtual reality environment, where users selected their most and least preferred viewpoints for 36 different graphs varying in size and layout. From this data, enriched by qualitative feedback, we distil common strategies underlying viewpoint choice. We further analyse the alignment of user preferences with classical 2D aesthetic criteria (e.g., Crossings), 3D-specific measures (e.g., Node-Node Occlusion), and introduce a novel measure capturing the perceivability of a graph’s principal axes (Isometric Viewpoint Deviation). Our data-driven analysis indicates that Stress, Crossings, Gabriel Ratio, Edge-Node Overlap, and Isometric Viewpoint Deviation are key indicators of viewpoint preference. Beyond our findings, we contribute a publicly available dataset consisting of the graphs and computed aesthetic measures, supporting further research and the development of viewpoint evaluation measures for 3D graph drawing.

Cite as

Lucas Joos, Gavin J. Mooney, Maximilian T. Fischer, Daniel A. Keim, Falk Schreiber, Helen C. Purchase, and Karsten Klein. Show Me Your Best Side: Characteristics of User-Preferred Perspectives for 3D Graph Drawings. In 33rd International Symposium on Graph Drawing and Network Visualization (GD 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 357, pp. 37:1-37:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{joos_et_al:LIPIcs.GD.2025.37,
  author =	{Joos, Lucas and Mooney, Gavin J. and Fischer, Maximilian T. and Keim, Daniel A. and Schreiber, Falk and Purchase, Helen C. and Klein, Karsten},
  title =	{{Show Me Your Best Side: Characteristics of User-Preferred Perspectives for 3D Graph Drawings}},
  booktitle =	{33rd International Symposium on Graph Drawing and Network Visualization (GD 2025)},
  pages =	{37:1--37:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-403-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{357},
  editor =	{Dujmovi\'{c}, Vida and Montecchiani, Fabrizio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.GD.2025.37},
  URN =		{urn:nbn:de:0030-drops-250236},
  doi =		{10.4230/LIPIcs.GD.2025.37},
  annote =	{Keywords: Graph Aesthetics, Immersive 3D, Node-Link Diagrams, Empirical Evaluation}
}
Document
An Efficient Data Structure and Algorithm for Long-Match Query in Run-Length Compressed BWT

Authors: Ahsan Sanaullah, Degui Zhi, and Shaojie Zhang

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
String matching problems in bioinformatics are typically for finding exact substring matches between a query and a reference text. Previous formulations often focus on maximum exact matches (MEMs). However, multiple occurrences of substrings of the query in the text that are long enough but not maximal may not be captured by MEMs. Such long matches can be informative, especially when the text is a collection of similar sequences such as genomes. In this paper, we describe a new type of match between a pattern and a text that aren't necessarily maximal in the query, but still contain useful matching information: locally maximal exact matches (LEMs). There are usually a large amount of LEMs, so we only consider those above some length threshold ℒ. These are referred to as long LEMs. The purpose of long LEMs is to capture substring matches between a query and a text that are not necessarily maximal in the pattern but still long enough to be important. Therefore efficient long LEMs finding algorithms are desired for these datasets. However, these datasets are too large to query on traditional string indexes. Fortunately, these datasets are very repetitive. Recently, compressed string indexes that take advantage of the redundancy in the data but retain efficient querying capability have been proposed as a solution. We therefore give an efficient algorithm for computing all the long LEMs of a query and a text in a BWT runs compressed string index. We describe an O(m+occ) expected time algorithm that relies on an O(r) words space string index for outputting all long LEMs of a pattern with respect to a text given the matching statistics of the pattern with respect to the text. Here m is the length of the query, occ is the number of long LEMs outputted, and r is the number of runs in the BWT of the text. The O(r) space string index we describe relies on an adaptation of the move data structure by Nishimoto and Tabei. We are able to support LCP[i] queries in constant time given SA[i]. In other words, we answer PLCP[i] queries in constant time. These PLCP queries enable the efficient long LEM query. Long LEMs may provide useful similarity information between a pattern and a text that MEMs may ignore. This information is particularly useful in pangenome and biobank scale haplotype panel contexts.

Cite as

Ahsan Sanaullah, Degui Zhi, and Shaojie Zhang. An Efficient Data Structure and Algorithm for Long-Match Query in Run-Length Compressed BWT. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 17:1-17:25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{sanaullah_et_al:LIPIcs.WABI.2025.17,
  author =	{Sanaullah, Ahsan and Zhi, Degui and Zhang, Shaojie},
  title =	{{An Efficient Data Structure and Algorithm for Long-Match Query in Run-Length Compressed BWT}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{17:1--17:25},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.17},
  URN =		{urn:nbn:de:0030-drops-239433},
  doi =		{10.4230/LIPIcs.WABI.2025.17},
  annote =	{Keywords: BWT, LEM, Long LEM, MEM, Run Length Compressed BWT, Move Data Structure, Pangenome}
}
Document
DiVerG: Scalable Distance Index for Validation of Paired-End Alignments in Sequence Graphs

Authors: Ali Ghaffaari, Alexander Schönhuth, and Tobias Marschall

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
Determining the distance between two loci within a genomic region is a recurrent operation in various tasks in computational genomics. A notable example of this task arises in paired-end read mapping as a form of validation of distances between multiple alignments. While straightforward for a single genome, graph-based reference structures render the operation considerably more involved. Given the sheer number of such queries in a typical read mapping experiment, an efficient algorithm for answering distance queries is crucial. In this paper, we introduce DiVerG, a compact data structure as well as a fast and scalable algorithm, for constructing distance indexes for general sequence graphs on multi-core CPU and many-core GPU architectures. DiVerG is based on PairG [Jain et al., 2019], but overcomes the limitations of PairG by exploiting the extensive potential for improvements in terms of scalability and space efficiency. As a consequence, DiVerG can process substantially larger datasets, such as whole human genomes, which are unmanageable by PairG. DiVerG offers faster index construction time and consistently faster query time with gains proportional to the size of the underlying compact data structure. We demonstrate that our method performs favorably on multiple real datasets at various scales. DiVerG achieves superior performance over PairG; e.g. resulting to 2.5-4x speed-up in query time, 44-340x smaller index size, and 3-50x faster construction time for the genome graph of the MHC region, as a particularly variable region of the human genome. The implementation is available at: https://github.com/cartoonist/diverg

Cite as

Ali Ghaffaari, Alexander Schönhuth, and Tobias Marschall. DiVerG: Scalable Distance Index for Validation of Paired-End Alignments in Sequence Graphs. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 10:1-10:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{ghaffaari_et_al:LIPIcs.WABI.2025.10,
  author =	{Ghaffaari, Ali and Sch\"{o}nhuth, Alexander and Marschall, Tobias},
  title =	{{DiVerG: Scalable Distance Index for Validation of Paired-End Alignments in Sequence Graphs}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{10:1--10:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.10},
  URN =		{urn:nbn:de:0030-drops-239369},
  doi =		{10.4230/LIPIcs.WABI.2025.10},
  annote =	{Keywords: Sequence graph, distance index, read mapping, sparse matrix}
}
Document
Human Readable Compression of GFA Paths Using Grammar-Based Code

Authors: Peter Heringer and Daniel Doerr

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
Pangenome graphs offer a compact and comprehensive representation of genomic diversity, improving tasks such as variant calling, genotyping, and other downstream analyses. Although the underlying graph structures scale sublinearly with the number of haplotypes, the widely used GFA file format suffers from rapidly growing file sizes due to the explicit and repetitive encoding of haplotype paths. In this work, we introduce an extension to the GFA format that enables efficient grammar-based compression of haplotype paths while retaining human readability. In addition, grammar-based encoding provides an efficient in-memory data structure that does not require decompression, but conversely improves the runtime of many computational tasks that involve haplotype comparisons. We present sqz, a method that makes use of the proposed format extension to encode haplotype paths using byte pair encoding, a grammar-based compression scheme. We evaluate sqz on recent human pangenome graphs from Heumos et al. and the Human Pangenome Reference Consortium (HPRC), comparing it to existing compressors bgzip, gbz, and sequitur. sqz scales sublinearly with the number of haplotypes in a pangenome graph and consistently achieves higher compression ratios than sequitur and up to 5 times better compression than bgzip in HPRC graphs and up to 10 times in the graph from Heumos et al.. When combined with bgzip, sqz matches or excels the compression ratio of gbz across all our datasets. These results demonstrate the potential of our proposed extension of the GFA format in reducing haplotype path redundancy and improving storage efficiency for pangenome graphs.

Cite as

Peter Heringer and Daniel Doerr. Human Readable Compression of GFA Paths Using Grammar-Based Code. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 14:1-14:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{heringer_et_al:LIPIcs.WABI.2025.14,
  author =	{Heringer, Peter and Doerr, Daniel},
  title =	{{Human Readable Compression of GFA Paths Using Grammar-Based Code}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{14:1--14:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.14},
  URN =		{urn:nbn:de:0030-drops-239395},
  doi =		{10.4230/LIPIcs.WABI.2025.14},
  annote =	{Keywords: pangenomics, pangenome graphs, compression, grammar-based code, byte pair encoding}
}
Document
Research
Conditional Lower Bounds for String Matching in Labelled Graphs

Authors: Massimo Equi

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)


Abstract
The problem of String Matching in Labelled Graphs (SMLG) is one possible generalization of the classic problem of finding a string inside another of greater length. In its most general form, SMLG asks to find a match for a string into a graph, which can be directed or undirected. As for string matching, many different variations are possible. For example, the match could be exact or approximate, and the match could lie on a path or a walk. Some of these variations easily fall into the NP-hard realm, while other variants are solvable in polynomial time. For the latter ones, fine-grained complexity has been a game changer in proving quadratic conditional lower bounds, allowing to finally close the gap with those upper bounds that remained unmatched for almost two decades. If the match is allowed to be approximate, SMLG enjoys the same conditional quadratic lower bounds shown for example for edit distance (Backurs and Indyk, STOC '15). The case that really requires ad hoc conditional lower bounds is the one of finding an exact match that lies on a walk. In this work, we focus on explaining various conditional lower bounds for this version of SMLG, with the goal of giving an overall perspective that could help understand which aspects of the problem make it quadratic. We will introduce the reader to the field of fine-grained complexity and show how it can successfully provide the exact type of lower bounds needed for polynomial problems such as SMLG.

Cite as

Massimo Equi. Conditional Lower Bounds for String Matching in Labelled Graphs. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 7:1-7:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{equi:OASIcs.Grossi.7,
  author =	{Equi, Massimo},
  title =	{{Conditional Lower Bounds for String Matching in Labelled Graphs}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{7:1--7:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.7},
  URN =		{urn:nbn:de:0030-drops-238063},
  doi =		{10.4230/OASIcs.Grossi.7},
  annote =	{Keywords: conditional lower bounds, strong exponential time hypothesis, fine-grained complexity, string matching, graphs}
}
Document
BWT for String Collections

Authors: Davide Cenzato, Zsuzsanna Lipták, Nadia Pisanti, Giovanna Rosone, and Marinella Sciortino

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
We survey the different methods used for extending the BWT to collections of strings, following largely [Cenzato and Lipták, CPM 2022, Bioinformatics 2024]. We analyze the specific aspects and combinatorial properties of the resulting BWT variants and give a categorization of publicly available tools for computing the BWT of string collections. We show how the specific method used impacts on the resulting transform, including the number of runs, and on the dynamicity of the transform with respect to adding or removing strings from the collection. We then focus on the number of runs of these BWT variants and present the optimal BWT introduced in [Cenzato et al., DCC 2023], which implements an algorithm originally proposed by [Bentley et al., ESA 2020] to minimize the number of BWT-runs. We also discuss several recent heuristics and study their impact on the compression of biological sequences. We conclude with an overview of the applications and the impact of the BWT of string collections in bioinformatics.

Cite as

Davide Cenzato, Zsuzsanna Lipták, Nadia Pisanti, Giovanna Rosone, and Marinella Sciortino. BWT for String Collections. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 3:1-3:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{cenzato_et_al:OASIcs.Manzini.3,
  author =	{Cenzato, Davide and Lipt\'{a}k, Zsuzsanna and Pisanti, Nadia and Rosone, Giovanna and Sciortino, Marinella},
  title =	{{BWT for String Collections}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{3:1--3:29},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.3},
  URN =		{urn:nbn:de:0030-drops-239113},
  doi =		{10.4230/OASIcs.Manzini.3},
  annote =	{Keywords: Burrows-Wheeler transform, Extended Burrows-Wheeler transform, compressed text indexes, text compression, string collections, bioinformatics}
}
Document
Algorithms for Computing Very Large BWTs: a Short Survey

Authors: Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
The Burrows-Wheeler Transform (BWT) is a fundamental string transformation that, although initially introduced for data compression, has been extensively utilized across various domains, including text indexing and pattern matching within large datasets. Although the BWT construction is linear, the constants make the task impractical for large datasets, and as highlighted by Ferragina et al. [Paolo Ferragina et al., 2012], "to use it, one must first build it!". Thus, the construction of the BWT remains a significant challenge. For these reasons, during the past three decades there has been a succession of new algorithms for its construction using techniques that work in external memory or that use text compression. In this survey, we revise some of the most important advancements and tools presented in the past years for computing large BWTs exploiting external memory or text compression approaches without using additional information about the data.

Cite as

Diego Díaz-Domínguez, Lavinia Egidi, Veronica Guerrini, Felipe A. Louza, and Giovanna Rosone. Algorithms for Computing Very Large BWTs: a Short Survey. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 7:1-7:28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{diazdominguez_et_al:OASIcs.Manzini.7,
  author =	{D{\'\i}az-Dom{\'\i}nguez, Diego and Egidi, Lavinia and Guerrini, Veronica and Louza, Felipe A. and Rosone, Giovanna},
  title =	{{Algorithms for Computing Very Large BWTs: a Short Survey}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{7:1--7:28},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.7},
  URN =		{urn:nbn:de:0030-drops-239151},
  doi =		{10.4230/OASIcs.Manzini.7},
  annote =	{Keywords: Burrows-Wheeler transform, Extended Burrows-Wheeler transform, external memory, text compression, longest common prefix}
}
Document
Graph Indexing Beyond Wheeler Graphs

Authors: Jarno N. Alanko, Elena Biagi, Massimo Equi, Veli Mäkinen, Simon J. Puglisi, Nicola Rizzo, Kunihiko Sadakane, and Jouni Sirén

Published in: OASIcs, Volume 131, The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday (2025)


Abstract
After the discovery of the FM index, which linked the Burrows-Wheeler transform (BWT) to pattern matching on strings, several contemporaneous strands of research began on indexing more complex structures with the BWT, such as tries, finite languages, de Bruijn graphs, and aligned sequences. These directions can now be viewed as culminating in the theory of Wheeler Graphs, but sometimes they go beyond. This chapter reviews the significant body of "proto Wheeler Graph" indexes, many of which exploit characteristics of their specific case to outperform Wheeler graphs, especially in practice.

Cite as

Jarno N. Alanko, Elena Biagi, Massimo Equi, Veli Mäkinen, Simon J. Puglisi, Nicola Rizzo, Kunihiko Sadakane, and Jouni Sirén. Graph Indexing Beyond Wheeler Graphs. In The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 131, pp. 13:1-13:29, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{alanko_et_al:OASIcs.Manzini.13,
  author =	{Alanko, Jarno N. and Biagi, Elena and Equi, Massimo and M\"{a}kinen, Veli and Puglisi, Simon J. and Rizzo, Nicola and Sadakane, Kunihiko and Sir\'{e}n, Jouni},
  title =	{{Graph Indexing Beyond Wheeler Graphs}},
  booktitle =	{The Expanding World of Compressed Data: A Festschrift for Giovanni Manzini's 60th Birthday},
  pages =	{13:1--13:29},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-390-4},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{131},
  editor =	{Ferragina, Paolo and Gagie, Travis and Navarro, Gonzalo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Manzini.13},
  URN =		{urn:nbn:de:0030-drops-239215},
  doi =		{10.4230/OASIcs.Manzini.13},
  annote =	{Keywords: indexing, compression, compressed data structures, string algorithms, pattern matching}
}
Document
Pangenome Graph Indexing via the Multidollar-BWT

Authors: Davide Cozzi, Brian Riccardi, Luca Denti, Simone Ciccolella, Kunihiko Sadakane, and Paola Bonizzoni

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)


Abstract
Indexing pangenome graphs is a major algorithmic challenge in computational pangenomics, a recent and active research field that seeks to use graphs as representations of multiple genomes. Since these graphs are constructed from whole genome sequences of a species population, they can become very large, making indexing one of the most challenging problems. In this paper, we propose gindex, a novel indexing approach to solve the Graph Pattern Matching Problem based on the multidollar-BWT. Specifically, gindex aims to find all occurrences of a pattern in a sequence-labeled graph by overcoming two main limitations of GCSA2, one of the most widely used graph indexes: handling queries of arbitrary length and scaling to large graphs without pruning any complex regions. Moreover, we show how a smart preprocessing step can optimize the use of multidollar-BWT to skip small redundant sub-patterns and enhance gindex’s querying capabilities. We demonstrate the effectiveness of our approach by comparing it to GCSA2 in terms of index construction and query time, using different preprocessing modes on three pangenome graphs: one built from Drosophila genomes and two produced by the Human Pangenome Reference Consortium. The results show that gindex can scale on human pangenome graphs - which GCSA2 cannot index using large amounts of RAM - with acceptable memory and time requirements. Moreover, gindex achieves fast query times, although not as fast as GCSA2, which may produce false positives.

Cite as

Davide Cozzi, Brian Riccardi, Luca Denti, Simone Ciccolella, Kunihiko Sadakane, and Paola Bonizzoni. Pangenome Graph Indexing via the Multidollar-BWT. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 13:1-13:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{cozzi_et_al:LIPIcs.SEA.2025.13,
  author =	{Cozzi, Davide and Riccardi, Brian and Denti, Luca and Ciccolella, Simone and Sadakane, Kunihiko and Bonizzoni, Paola},
  title =	{{Pangenome Graph Indexing via the Multidollar-BWT}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{13:1--13:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.13},
  URN =		{urn:nbn:de:0030-drops-232515},
  doi =		{10.4230/LIPIcs.SEA.2025.13},
  annote =	{Keywords: Multidollar-BWT, Graph Index, Graph Pattern Matching, Pangenome Graph}
}
Document
SimdMinimizers: Computing Random Minimizers, fast

Authors: Ragnar Groot Koerkamp and Igor Martayan

Published in: LIPIcs, Volume 338, 23rd International Symposium on Experimental Algorithms (SEA 2025)


Abstract
Motivation. Because of the rapidly-growing amount of sequencing data, computing sketches of large textual datasets has become an essential preprocessing task. These sketches are typically much smaller than the input sequences, but preserve sufficient information for downstream analysis. Minimizers are an especially popular sketching technique and used in a wide variety of applications. They sample at least one out of every w consecutive k-mers. As DNA sequencers are getting more accurate, some applications can afford to use a larger w and hence sparser and smaller sketches. And as sketches get smaller, their analysis becomes faster, so the time spent sketching the full-sized input becomes more of a bottleneck. Methods. Our library simd-minimizers implements a random minimizer algorithm using SIMD instructions. It supports both AVX2 and NEON architectures. Its main novelty is two-fold. First, it splits the input into 8 chunks that are streamed over in parallel through all steps of the algorithm. This is enabled by using the completely deterministic two-stacks sliding window minimum algorithm, which seems not to have been used before for finding minimizers. Results. Our library is up to 6.8× faster than a scalar implementation of the rescan method when w = 5 is small, and 3.4× faster for larger w = 19. Computing canonical minimizers is less than 50% slower than computing forward minimizers, and over 15× faster than the existing implementation in the minimizer-iter crate. Our library finds all (canonical) minimizers of a 3.2 Gbp human genome in 5.2 (resp. 6.7) seconds.

Cite as

Ragnar Groot Koerkamp and Igor Martayan. SimdMinimizers: Computing Random Minimizers, fast. In 23rd International Symposium on Experimental Algorithms (SEA 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 338, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{grootkoerkamp_et_al:LIPIcs.SEA.2025.20,
  author =	{Groot Koerkamp, Ragnar and Martayan, Igor},
  title =	{{SimdMinimizers: Computing Random Minimizers, fast}},
  booktitle =	{23rd International Symposium on Experimental Algorithms (SEA 2025)},
  pages =	{20:1--20:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-375-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{338},
  editor =	{Mutzel, Petra and Prezza, Nicola},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2025.20},
  URN =		{urn:nbn:de:0030-drops-232581},
  doi =		{10.4230/LIPIcs.SEA.2025.20},
  annote =	{Keywords: Minimizers, Randomized algorithms, Sketching, Hashing}
}
Document
Representing Paths in Digraphs

Authors: Riccardo Dondi and Alexandru Popa

Published in: LIPIcs, Volume 331, 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)


Abstract
In this contribution we consider two combinatorial problems related to graph string matching, motivated by recent approaches in computational genomics. Given a DAG where each node is labeled by a symbol, the problems aim to find a path in the DAG whose nodes contain all (or the maximum number of) symbols of the alphabet. We introduce a decision problem, Σ-Representing Path, that asks whether there exists a path that contains all the symbols of the alphabet, and an optimization problem, called Maximum Representing Path, that asks for a path that contains the maximum number of symbols. We analyze the complexity of the problems, showing the NP-completeness of {Σ-Representing Path} when each symbol labels at most three nodes in the DAG, and showing the APX-hardness of Maximum Representing Path when each symbol labels at most two nodes in the DAG. We complement the first result by giving a polynomial-time algorithm for Σ-Representing Path when each symbol labels at most two nodes in the DAG. Then we investigate the parameterized complexity of the two problems when the DAG has a limited distance from a set of disjoint paths and we show that both problems are W[1]-hard for this parameter. We consider the approximation of Maximum Representing Path, giving an approximation algorithm of factor √OPT, where OPT is the value of an optimal solution of the problem. We also show that Maximum Representing Path cannot be approximated within factor e/(e-1) - α, for any constant α > 0, unless NP ⊆ DTIME(|V|^{O(log log |V|)}) (V is the set of nodes of the DAG).

Cite as

Riccardo Dondi and Alexandru Popa. Representing Paths in Digraphs. In 36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 1:1-1:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{dondi_et_al:LIPIcs.CPM.2025.1,
  author =	{Dondi, Riccardo and Popa, Alexandru},
  title =	{{Representing Paths in Digraphs}},
  booktitle =	{36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025)},
  pages =	{1:1--1:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-369-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{331},
  editor =	{Bonizzoni, Paola and M\"{a}kinen, Veli},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2025.1},
  URN =		{urn:nbn:de:0030-drops-230954},
  doi =		{10.4230/LIPIcs.CPM.2025.1},
  annote =	{Keywords: Graph String Matching, Computational Complexity, Parameterized Complexity, Algorithms}
}
Document
Haplotype-aware graph indexes

Authors: Jouni Sirén, Erik Garrison, Adam M. Novak, Benedict J. Paten, and Richard Durbin

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
The variation graph toolkit (VG) represents genetic variation as a graph. Each path in the graph is a potential haplotype, though most paths are unlikely recombinations of true haplotypes. We augment the VG model with haplotype information to identify which paths are more likely to be correct. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows-Wheeler transform. We demonstrate the scalability of the new implementation by indexing the 1000 Genomes Project haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.

Cite as

Jouni Sirén, Erik Garrison, Adam M. Novak, Benedict J. Paten, and Richard Durbin. Haplotype-aware graph indexes. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 4:1-4:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{siren_et_al:LIPIcs.WABI.2018.4,
  author =	{Sir\'{e}n, Jouni and Garrison, Erik and Novak, Adam M. and Paten, Benedict J. and Durbin, Richard},
  title =	{{Haplotype-aware graph indexes}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{4:1--4:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.4},
  URN =		{urn:nbn:de:0030-drops-93060},
  doi =		{10.4230/LIPIcs.WABI.2018.4},
  annote =	{Keywords: FM-indexes, variation graphs, haplotypes}
}
Document
An Average-Case Sublinear Exact Li and Stephens Forward Algorithm

Authors: Yohei M. Rosen and Benedict J. Paten

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithms as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations are used. However, sequencing projects numbering in the thousands to hundreds of thousands of individuals are underway, and others numbering in the millions are anticipated. To make the Li and Stephens forward algorithm for these datasets computationally tractable, we have created a numerically exact version of the algorithm with observed average case O(nk^{0.35}) runtime in number of genetic sites n and reference panel size k. This avoids any tradeoff between runtime and model complexity. We demonstrate that our approach also provides a succinct data structure for general purpose haplotype data storage. We discuss generalizations of our algorithmic techniques to other hidden Markov models.

Cite as

Yohei M. Rosen and Benedict J. Paten. An Average-Case Sublinear Exact Li and Stephens Forward Algorithm. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 9:1-9:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{rosen_et_al:LIPIcs.WABI.2018.9,
  author =	{Rosen, Yohei M. and Paten, Benedict J.},
  title =	{{An Average-Case Sublinear Exact Li and Stephens Forward Algorithm}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{9:1--9:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.9},
  URN =		{urn:nbn:de:0030-drops-93116},
  doi =		{10.4230/LIPIcs.WABI.2018.9},
  annote =	{Keywords: Haplotype, Hidden Markov Model, Forward Algorithm, Lazy Evaluation}
}
  • Refine by Type
  • 13 Document/PDF
  • 11 Document/HTML

  • Refine by Publication Year
  • 11 2025
  • 2 2018

  • Refine by Author
  • 2 Equi, Massimo
  • 2 Paten, Benedict J.
  • 2 Rosone, Giovanna
  • 2 Sadakane, Kunihiko
  • 2 Sirén, Jouni
  • Show More...

  • Refine by Series/Journal
  • 9 LIPIcs
  • 4 OASIcs

  • Refine by Classification
  • 4 Theory of computation → Pattern matching
  • 3 Applied computing → Bioinformatics
  • 3 Theory of computation → Data compression
  • 3 Theory of computation → Data structures design and analysis
  • 2 Applied computing → Computational genomics
  • Show More...

  • Refine by Keyword
  • 2 Burrows-Wheeler transform
  • 2 Extended Burrows-Wheeler transform
  • 2 compression
  • 2 text compression
  • 1 Algorithms
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail