25 Search Results for "K�hler, Sven"


Document
Invited Talk
Efficient Solutions to Biological Problems Using de Bruijn Graphs (Invited Talk)

Authors: Leena Salmela

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
The de Bruijn graph has become a standard method in the analysis of sequencing reads in computational biology due to its ability to represent the information contained in large read sets in small space. A de Bruijn graph represents a set of sequencing reads by its k-mers, i.e. the set of substrings of length k that occur in the reads. In the classical definition, the k-mers are the edges of the graph and the nodes are the k-1 bases long prefixes and suffixes of the k-mers. Usually only k-mers occurring several times in the read set are kept to filter out noise in the data. De Bruijn graphs have been used to solve many problems in computational biology including genome assembly [Ramana M. Idury and Michael S. Waterman, 1995; Pavel A. Pevzner et al., 2001; Anton Bankevich et al., 2012; Yu Peng et al., 2010], sequencing error correction [Leena Salmela and Eric Rivals, 2014; Giles Miclotte et al., 2016; Leena Salmela et al., 2017; Limasset et al., 2019], reference free variant calling [Raluca Uricaru et al., 2015], indexing read sets [Camille Marchet et al., 2021], and so on. Next I will discuss two of these problems in more depth. The de Bruijn graph first emerged in computation biology in the context of genome assembly [Ramana M. Idury and Michael S. Waterman, 1995; Pavel A. Pevzner et al., 2001] where the task is to reconstruct a genome based on sequencing reads. As the de Bruijn graph can represent large read sets compactly, it became the standard approach to assemble short reads [Anton Bankevich et al., 2012; Yu Peng et al., 2010]. In the theoretical framework of de Bruijn graph based genome assembly, a genome is thought to be the Eulerian path in the de Bruijn graph built on the sequencing reads. In practise, the Eulerian path is not unique and thus not useful in the biological context. Therefore, practical implementations report subpaths that are guaranteed to be part of any Eulerian path and thus part of the actual genome. Such models include unitigs, which are nonbranching paths of the de Bruijn graph, and more involved definitions such as omnitigs [Alexandru I. Tomescu and Paul Medvedev, 2017]. In genome assembly the choice of k is a crucial matter. A small k can result in a tangled graph, whereas a too large k will fragment the graph. Furthermore, a different value of k may be optimal for different parts of the genome. Variable order de Bruijn graphs [Christina Boucher et al., 2015; Djamal Belazzougui et al., 2016], which represent de Bruijn graphs of all orders k in a single data structure, have been proposed as a solution but no rigorous definition corresponding to unitigs has been presented. We give the first definition of assembled sequences, i.e. contigs, on such graphs and an algorithm for enumerating them. Another problem that can be solved with de Bruijn graphs is the correction of sequencing errors [Leena Salmela and Eric Rivals, 2014; Giles Miclotte et al., 2016; Leena Salmela et al., 2017; Limasset et al., 2019]. Because each position of a genome is sequenced several times, it is possible to correct sequencing errors in reads if we can identify data originating from the same genomic region. A de Bruijn graph can be used to represent compactly the reliable information and the individual reads can be corrected by aligning them to the graph.

Cite as

Leena Salmela. Efficient Solutions to Biological Problems Using de Bruijn Graphs (Invited Talk). In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 1:1-1:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{salmela:LIPIcs.WABI.2022.1,
  author =	{Salmela, Leena},
  title =	{{Efficient Solutions to Biological Problems Using de Bruijn Graphs}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{1:1--1:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.1},
  URN =		{urn:nbn:de:0030-drops-170357},
  doi =		{10.4230/LIPIcs.WABI.2022.1},
  annote =	{Keywords: de Bruijn graph, variable order de Bruijn graph, genome assembly, sequencing error correction, k-mers}
}
Document
Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time

Authors: Sebastian Schmidt and Jarno N. Alanko

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
A fundamental operation in computational genomics is to reduce the input sequences to their constituent k-mers. For maximum performance of downstream applications it is important to store the k-mers in small space, while keeping the representation easy and efficient to use (i.e. without k-mer repetitions and in plain text). Recently, heuristics were presented to compute a near-minimum such representation. We present an algorithm to compute a minimum representation in optimal (linear) time and use it to evaluate the existing heuristics. For that, we present a formalisation of arc-centric bidirected de Bruijn graphs and carefully prove that it accurately models the k-mer spectrum of the input. Our algorithm first constructs the de Bruijn graph in linear time in the length of the input strings (for a fixed-size alphabet). Then it uses a Eulerian-cycle-based algorithm to compute the minimum representation, in time linear in the size of the output.

Cite as

Sebastian Schmidt and Jarno N. Alanko. Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 2:1-2:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.WABI.2022.2,
  author =	{Schmidt, Sebastian and Alanko, Jarno N.},
  title =	{{Eulertigs: Minimum Plain Text Representation of k-mer Sets Without Repetitions in Linear Time}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{2:1--2:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.2},
  URN =		{urn:nbn:de:0030-drops-170361},
  doi =		{10.4230/LIPIcs.WABI.2022.2},
  annote =	{Keywords: Spectrum preserving string sets, Eulerian cycle, Suffix tree, Bidirected arc-centric de Bruijn graph, k-mer based methods}
}
Document
On Weighted k-mer Dictionaries

Authors: Giulio Ermanno Pibiri

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
We consider the problem of representing a set of k-mers and their abundance counts, or weights, in compressed space so that assessing membership and retrieving the weight of a k-mer is efficient. The representation is called a weighted dictionary of k-mers and finds application in numerous tasks in Bioinformatics that usually count k-mers as a pre-processing step. In fact, k-mer counting tools produce very large outputs that may result in a severe bottleneck for subsequent processing. In this work we extend the recently introduced SSHash dictionary (Pibiri, Bioinformatics 2022) to also store compactly the weights of the k-mers. From a technical perspective, we exploit the order of the k-mers represented in SSHash to encode runs of weights, hence allowing (several times) better compression than the empirical entropy of the weights. We also study the problem of reducing the number of runs in the weights to improve compression even further and illustrate a lower bound for this problem. We propose an efficient, greedy, algorithm to reduce the number of runs and show empirically that it performs well, i.e., very similarly to the lower bound. Lastly, we corroborate our findings with experiments on real-world datasets and comparison with competitive alternatives. Up to date, SSHash is the only k-mer dictionary that is exact, weighted, associative, fast, and small.

Cite as

Giulio Ermanno Pibiri. On Weighted k-mer Dictionaries. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 9:1-9:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{pibiri:LIPIcs.WABI.2022.9,
  author =	{Pibiri, Giulio Ermanno},
  title =	{{On Weighted k-mer Dictionaries}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{9:1--9:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.9},
  URN =		{urn:nbn:de:0030-drops-170439},
  doi =		{10.4230/LIPIcs.WABI.2022.9},
  annote =	{Keywords: K-Mers, Weights, Compression, Hashing}
}
Document
Accurate k-mer Classification Using Read Profiles

Authors: Yoshihiko Suzuki and Gene Myers

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
Contiguous strings of length k, called k-mers, are a fundamental element in many bioinformatics tasks. The number of occurrences of a k-mer in a given set of DNA sequencing reads, its k-mer count, has often been used to roughly estimate the copy number of a k-mer in the genome from which the reads were sampled. The problem of estimating copy numbers, called here the k-mer classification problem, has been based on simply analyzing the histogram of counts of all the k-mers in a data set, thus ignoring the positional context and dependency between multiple k-mers that appear nearby in the underlying genome. Here we present an efficient and significantly more accurate method for classifying k-mers by analyzing the sequence of k-mer counts along each sequencing read, called a read profile. By analyzing read profiles, we explicitly incorporate into the model the dependencies between the positionally adjacent k-mers and the sequence context-dependent error rates estimated from the given dataset. For long sequencing reads produced with the accurate high-fidelity (HiFi) sequencing technology, an implementation of our method, ClassPro, outperforms the conventional, histogram-based method in every simulation dataset of fruit fly and human with various realistic values of sequencing coverage and heterozygosity. Within only a few minutes, ClassPro achieves an average accuracy of > 99.99% across reads without repetitive k-mers and > 99.5% across all reads, in a typical fruit fly simulation data set with a 40× coverage. The resulting, more accurate k-mer classifications by ClassPro are in principle expected to improve any k-mer-based downstream analyses for sequenced reads such as read mapping and overlap, spectral alignment and error correction, haplotype phasing, and trio binning to name but a few. ClassPro is available at https://github.com/yoshihikosuzuki/ClassPro.

Cite as

Yoshihiko Suzuki and Gene Myers. Accurate k-mer Classification Using Read Profiles. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 10:1-10:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{suzuki_et_al:LIPIcs.WABI.2022.10,
  author =	{Suzuki, Yoshihiko and Myers, Gene},
  title =	{{Accurate k-mer Classification Using Read Profiles}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{10:1--10:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.10},
  URN =		{urn:nbn:de:0030-drops-170446},
  doi =		{10.4230/LIPIcs.WABI.2022.10},
  annote =	{Keywords: K-mer, K-mer count, K-mer classification, HiFi sequencing}
}
Document
Fast Gapped k-mer Counting with Subdivided Multi-Way Bucketed Cuckoo Hash Tables

Authors: Jens Zentgraf and Sven Rahmann

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
Motivation. In biological sequence analysis, alignment-free (also known as k-mer-based) methods are increasingly replacing mapping- and alignment-based methods for various applications. A basic step of such methods consists of building a table of all k-mers of a given set of sequences (a reference genome or a dataset of sequenced reads) and their counts. Over the past years, efficient methods and tools for k-mer counting have been developed. In a different line of work, the use of gapped k-mers has been shown to offer advantages over the use of the standard contiguous k-mers. However, no tool seems to be available that is able to count gapped k-mers with the same efficiency as contiguous k-mers. One reason is that the most efficient k-mer counters use minimizers (of a length m < k) to group k-mers into buckets, such that many consecutive k-mers are classified into the same bucket. This approach leads to cache-friendly (and hence extremely fast) algorithms, but the approach does not transfer easily to gapped k-mers. Consequently, the existing efficient k-mer counters cannot be trivially modified to count gapped k-mers with the same efficiency. Results. We present a different approach that is equally applicable to contiguous k-mers and gapped k-mers. We use multi-way bucketed Cuckoo hash tables to efficiently store (gapped) k-mers and their counts. We also describe a method to parallelize counting over multiple threads without using locks: We subdivide the hash table into independent subtables, and use a producer-consumer model, such that each thread serves one subtable. This requires designing Cuckoo hash functions with the property that all alternative locations for each k-mer are located in the same subtable. Compared to some of the fastest contiguous k-mer counters, our approach is of comparable speed, or even faster, on large datasets, and it is the only one that supports gapped k-mers.

Cite as

Jens Zentgraf and Sven Rahmann. Fast Gapped k-mer Counting with Subdivided Multi-Way Bucketed Cuckoo Hash Tables. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{zentgraf_et_al:LIPIcs.WABI.2022.12,
  author =	{Zentgraf, Jens and Rahmann, Sven},
  title =	{{Fast Gapped k-mer Counting with Subdivided Multi-Way Bucketed Cuckoo Hash Tables}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.12},
  URN =		{urn:nbn:de:0030-drops-170467},
  doi =		{10.4230/LIPIcs.WABI.2022.12},
  annote =	{Keywords: gapped k-mer, k-mer, counting, Cuckoo hashing, parallelization}
}
Document
Efficient Reconciliation of Genomic Datasets of High Similarity

Authors: Yoshihiro Shibuya, Djamal Belazzougui, and Gregory Kucherov

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
We apply Invertible Bloom Lookup Tables (IBLTs) to the comparison of k-mer sets originated from large DNA sequence datasets. We show that for similar datasets, IBLTs provide a more space-efficient and, at the same time, more accurate method for estimating Jaccard similarity of underlying k-mer sets, compared to MinHash which is a go-to sketching technique for efficient pairwise similarity estimation. This is achieved by combining IBLTs with k-mer sampling based on syncmers, which constitute a context-independent alternative to minimizers and provide an unbiased estimator of Jaccard similarity. A key property of our method is that involved data structures require space proportional to the difference of k-mer sets and are independent of the size of sets themselves. As another application, we show how our ideas can be applied in order to efficiently compute (an approximation of) k-mers that differ between two datasets, still using space only proportional to their number. We experimentally illustrate our results on both simulated and real data (SARS-CoV-2 and Streptococcus Pneumoniae genomes).

Cite as

Yoshihiro Shibuya, Djamal Belazzougui, and Gregory Kucherov. Efficient Reconciliation of Genomic Datasets of High Similarity. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 14:1-14:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{shibuya_et_al:LIPIcs.WABI.2022.14,
  author =	{Shibuya, Yoshihiro and Belazzougui, Djamal and Kucherov, Gregory},
  title =	{{Efficient Reconciliation of Genomic Datasets of High Similarity}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{14:1--14:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.14},
  URN =		{urn:nbn:de:0030-drops-170481},
  doi =		{10.4230/LIPIcs.WABI.2022.14},
  annote =	{Keywords: k-mers, sketching, Invertible Bloom Lookup Tables, IBLT, MinHash, syncmers, minimizers}
}
Document
Suffix Sorting via Matching Statistics

Authors: Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.

Cite as

Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi. Suffix Sorting via Matching Statistics. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 20:1-20:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{liptak_et_al:LIPIcs.WABI.2022.20,
  author =	{Lipt\'{a}k, Zsuzsanna and Masillo, Francesco and Puglisi, Simon J.},
  title =	{{Suffix Sorting via Matching Statistics}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{20:1--20:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.20},
  URN =		{urn:nbn:de:0030-drops-170545},
  doi =		{10.4230/LIPIcs.WABI.2022.20},
  annote =	{Keywords: Generalized suffix array, matching statistics, string collections, compressed representation, data structures, efficient algorithms}
}
Document
Locality-Sensitive Bucketing Functions for the Edit Distance

Authors: Ke Chen and Mingfu Shao

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existing k-mer-based bucketing methods have been efficient in processing sequencing data with low error rate, but encounter much reduced sensitivity on data with high error rate. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps. Here we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be (d₁, d₂)-sensitive if any two sequences within an edit distance of d₁ are mapped into at least one shared bucket, and any two sequences with distance at least d₂ are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of (d₁,d₂) and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal. These results provide theoretical foundations for their practical use in analyzing sequences with high error rate while also providing insights for the hardness of designing ungapped LSH functions.

Cite as

Ke Chen and Mingfu Shao. Locality-Sensitive Bucketing Functions for the Edit Distance. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 22:1-22:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{chen_et_al:LIPIcs.WABI.2022.22,
  author =	{Chen, Ke and Shao, Mingfu},
  title =	{{Locality-Sensitive Bucketing Functions for the Edit Distance}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{22:1--22:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.22},
  URN =		{urn:nbn:de:0030-drops-170563},
  doi =		{10.4230/LIPIcs.WABI.2022.22},
  annote =	{Keywords: Locality-sensitive hashing, locality-sensitive bucketing, long reads, embedding}
}
Document
phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering

Authors: Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone, and Lorenzo Tattini

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
Molecular phylogenetics is a fundamental branch of biology. It studies the evolutionary relationships among the individuals of a population through their biological sequences, and may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. In this paper we develop a method called phyBWT, describing how to use the extended Burrows-Wheeler Transform (eBWT) for a collection of DNA sequences to directly reconstruct phylogeny, bypassing the alignment against a reference genome or de novo assembly. Our phyBWT hinges on the combinatorial properties of the eBWT positional clustering framework. We employ eBWT to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori), and build a suitable decomposition leading to a phylogenetic tree, step by step. As a result, phyBWT is a new alignment-, assembly-, and reference-free method that builds a partition tree without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. The preliminary experimental results on sequencing data show that our method can handle datasets of different types (short reads, contigs, or entire genomes), producing trees of quality comparable to that found in the benchmark phylogeny.

Cite as

Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone, and Lorenzo Tattini. phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 23:1-23:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{guerrini_et_al:LIPIcs.WABI.2022.23,
  author =	{Guerrini, Veronica and Conte, Alessio and Grossi, Roberto and Liti, Gianni and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{23:1--23:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.23},
  URN =		{urn:nbn:de:0030-drops-170577},
  doi =		{10.4230/LIPIcs.WABI.2022.23},
  annote =	{Keywords: Phylogeny, partition tree, BWT, positional cluster, alignment-free, reference-free, assembly-free}
}
Document
Fast Lightweight Accurate Xenograft Sorting

Authors: Jens Zentgraf and Sven Rahmann

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)


Abstract
Motivation: With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species' (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. Results: We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy.

Cite as

Jens Zentgraf and Sven Rahmann. Fast Lightweight Accurate Xenograft Sorting. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 4:1-4:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{zentgraf_et_al:LIPIcs.WABI.2020.4,
  author =	{Zentgraf, Jens and Rahmann, Sven},
  title =	{{Fast Lightweight Accurate Xenograft Sorting}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{4:1--4:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.4},
  URN =		{urn:nbn:de:0030-drops-127933},
  doi =		{10.4230/LIPIcs.WABI.2020.4},
  annote =	{Keywords: xenograft sorting, alignment-free method, Cuckoo hashing, k-mer}
}
Document
Approximate Pricing in Networks: How to Boost the Betweenness and Revenue of a Node

Authors: Ruben Brokkelkamp, Sven Polak, Guido Schäfer, and Yllka Velaj

Published in: LIPIcs, Volume 149, 30th International Symposium on Algorithms and Computation (ISAAC 2019)


Abstract
We introduce and study two new pricing problems in networks: Suppose we are given a directed graph G = (V, E) with non-negative edge costs (c_e)_{e in E}, k commodities (s_i, t_i, w_i)_{i in [k]} and a designated node u in V. Each commodity i in [k] is represented by a source-target pair (s_i, t_i) in V x V and a demand w_i>0, specifying that w_i units of flow are sent from s_i to t_i along shortest s_i, t_i-paths (with respect to (c_e)_{e in E}). The demand of each commodity is split evenly over all shortest paths. Assume we can change the edge costs of some of the outgoing edges of u, while the costs of all other edges remain fixed; we also say that we price (or tax) the edges of u. We study the problem of pricing the edges of u with respect to the following two natural objectives: (i) max-flow: maximize the total flow passing through u, and (ii) max-revenue: maximize the total revenue (flow times tax) through u. Both variants have various applications in practice. For example, the max flow objective is equivalent to maximizing the betweenness centrality of u, which is one of the most popular measures for the influence of a node in a (social) network. We prove that (except for some special cases) both problems are NP-hard and inapproximable in general and therefore resort to approximation algorithms. We derive approximation algorithms for both variants and show that the derived approximation guarantees are best possible.

Cite as

Ruben Brokkelkamp, Sven Polak, Guido Schäfer, and Yllka Velaj. Approximate Pricing in Networks: How to Boost the Betweenness and Revenue of a Node. In 30th International Symposium on Algorithms and Computation (ISAAC 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 149, pp. 13:1-13:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{brokkelkamp_et_al:LIPIcs.ISAAC.2019.13,
  author =	{Brokkelkamp, Ruben and Polak, Sven and Sch\"{a}fer, Guido and Velaj, Yllka},
  title =	{{Approximate Pricing in Networks: How to Boost the Betweenness and Revenue of a Node}},
  booktitle =	{30th International Symposium on Algorithms and Computation (ISAAC 2019)},
  pages =	{13:1--13:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-130-6},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{149},
  editor =	{Lu, Pinyan and Zhang, Guochuan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2019.13},
  URN =		{urn:nbn:de:0030-drops-115091},
  doi =		{10.4230/LIPIcs.ISAAC.2019.13},
  annote =	{Keywords: Network pricing, Stackelberg network pricing, betweenness centrality, revenue maximization}
}
Document
Ergodic Mean-Payoff Games for the Analysis of Attacks in Crypto-Currencies

Authors: Krishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen, and Yaron Velner

Published in: LIPIcs, Volume 118, 29th International Conference on Concurrency Theory (CONCUR 2018)


Abstract
Crypto-currencies are digital assets designed to work as a medium of exchange, e.g., Bitcoin, but they are susceptible to attacks (dishonest behavior of participants). A framework for the analysis of attacks in crypto-currencies requires (a) modeling of game-theoretic aspects to analyze incentives for deviation from honest behavior; (b) concurrent interactions between participants; and (c) analysis of long-term monetary gains. Traditional game-theoretic approaches for the analysis of security protocols consider either qualitative temporal properties such as safety and termination, or the very special class of one-shot (stateless) games. However, to analyze general attacks on protocols for crypto-currencies, both stateful analysis and quantitative objectives are necessary. In this work our main contributions are as follows: (a) we show how a class of concurrent mean-payoff games, namely ergodic games, can model various attacks that arise naturally in crypto-currencies; (b) we present the first practical implementation of algorithms for ergodic games that scales to model realistic problems for crypto-currencies; and (c) we present experimental results showing that our framework can handle games with thousands of states and millions of transitions.

Cite as

Krishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen, and Yaron Velner. Ergodic Mean-Payoff Games for the Analysis of Attacks in Crypto-Currencies. In 29th International Conference on Concurrency Theory (CONCUR 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 118, pp. 11:1-11:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{chatterjee_et_al:LIPIcs.CONCUR.2018.11,
  author =	{Chatterjee, Krishnendu and Kafshdar Goharshady, Amir and Ibsen-Jensen, Rasmus and Velner, Yaron},
  title =	{{Ergodic Mean-Payoff Games for the Analysis of Attacks in Crypto-Currencies}},
  booktitle =	{29th International Conference on Concurrency Theory (CONCUR 2018)},
  pages =	{11:1--11:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-087-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{118},
  editor =	{Schewe, Sven and Zhang, Lijun},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2018.11},
  URN =		{urn:nbn:de:0030-drops-95497},
  doi =		{10.4230/LIPIcs.CONCUR.2018.11},
  annote =	{Keywords: Crypto-currency, Quantitative Verification, Mean-payoff Games}
}
Document
Verifying Quantitative Temporal Properties of Procedural Programs

Authors: Mohamed Faouzi Atig, Ahmed Bouajjani, K. Narayan Kumar, and Prakash Saivasan

Published in: LIPIcs, Volume 118, 29th International Conference on Concurrency Theory (CONCUR 2018)


Abstract
We address the problem of specifying and verifying quantitative properties of procedural programs. These properties typically involve constraints on the relative cumulated costs of executing various tasks (by invoking for instance some particular procedures) within the scope of the execution of some particular procedure. An example of such properties is "within the execution of each invocation of procedure P, the time spent in executing invocations of procedure Q is less than 20 % of the total execution time". We introduce specification formalisms, both automata-based and logic-based, for expressing such properties, and we study the links between these formalisms and their application in model-checking. On one side, we define Constrained Pushdown Systems (CPDS), an extension of pushdown systems with constraints, expressed in Presburger arithmetics, on the numbers of occurrences of each symbol in the alphabet within invocation intervals (subcomputations between matching pushes and pops), and on the other side, we introduce a higher level specification language that is a quantitative extension of CaRet (the Call-Return temporal logic) called QCaRet where nested quantitative constraints over procedure invocation intervals are expressible using Presburger arithmetics. Then, we investigate (1) the decidability of the reachability and repeated reachability problems for CPDS, and (2) the effective reduction of the model-checking problem of procedural programs (modeled as visibly pushdown systems) against QCaRet formulas to these problems on CPDS.

Cite as

Mohamed Faouzi Atig, Ahmed Bouajjani, K. Narayan Kumar, and Prakash Saivasan. Verifying Quantitative Temporal Properties of Procedural Programs. In 29th International Conference on Concurrency Theory (CONCUR 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 118, pp. 15:1-15:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{atig_et_al:LIPIcs.CONCUR.2018.15,
  author =	{Atig, Mohamed Faouzi and Bouajjani, Ahmed and Narayan Kumar, K. and Saivasan, Prakash},
  title =	{{Verifying Quantitative Temporal Properties of Procedural Programs}},
  booktitle =	{29th International Conference on Concurrency Theory (CONCUR 2018)},
  pages =	{15:1--15:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-087-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{118},
  editor =	{Schewe, Sven and Zhang, Lijun},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2018.15},
  URN =		{urn:nbn:de:0030-drops-95531},
  doi =		{10.4230/LIPIcs.CONCUR.2018.15},
  annote =	{Keywords: Verification, Formal Methods, Pushdown systems, Visibly pushdown, Quantitative Temporal Properties}
}
Document
Up-To Techniques for Behavioural Metrics via Fibrations

Authors: Filippo Bonchi, Barbara König, and Daniela Petrisan

Published in: LIPIcs, Volume 118, 29th International Conference on Concurrency Theory (CONCUR 2018)


Abstract
Up-to techniques are a well-known method for enhancing coinductive proofs of behavioural equivalences. We introduce up-to techniques for behavioural metrics between systems modelled as coalgebras and we provide abstract results to prove their soundness in a compositional way. In order to obtain a general framework, we need a systematic way to lift functors: we show that the Wasserstein lifting of a functor, introduced in a previous work, corresponds to a change of base in a fibrational sense. This observation enables us to reuse existing results about soundness of up-to techniques in a fibrational setting. We focus on the fibrations of predicates and relations valued in a quantale, for which pseudo-metric spaces are an example. To illustrate our approach we provide an example on distances between regular languages.

Cite as

Filippo Bonchi, Barbara König, and Daniela Petrisan. Up-To Techniques for Behavioural Metrics via Fibrations. In 29th International Conference on Concurrency Theory (CONCUR 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 118, pp. 17:1-17:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{bonchi_et_al:LIPIcs.CONCUR.2018.17,
  author =	{Bonchi, Filippo and K\"{o}nig, Barbara and Petrisan, Daniela},
  title =	{{Up-To Techniques for Behavioural Metrics via Fibrations}},
  booktitle =	{29th International Conference on Concurrency Theory (CONCUR 2018)},
  pages =	{17:1--17:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-087-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{118},
  editor =	{Schewe, Sven and Zhang, Lijun},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2018.17},
  URN =		{urn:nbn:de:0030-drops-95552},
  doi =		{10.4230/LIPIcs.CONCUR.2018.17},
  annote =	{Keywords: behavioural metrics, bisimilarity, up-to techniques, coalgebras, fibrations}
}
Document
Relating Syntactic and Semantic Perturbations of Hybrid Automata

Authors: Nima Roohi, Pavithra Prabhakar, and Mahesh Viswanathan

Published in: LIPIcs, Volume 118, 29th International Conference on Concurrency Theory (CONCUR 2018)


Abstract
We investigate how the semantics of a hybrid automaton deviates with respect to syntactic perturbations on the hybrid automaton. We consider syntactic perturbations of a hybrid automaton, wherein the syntactic representations of its elements, namely, initial sets, invariants, guards, and flows, in some logic are perturbed. Our main result establishes a continuity like property that states that small perturbations in the syntax lead to small perturbations in the semantics. More precisely, we show that for every real number epsilon>0 and natural number k, there is a real number delta>0 such that H^delta, the delta syntactic perturbation of a hybrid automaton H, is epsilon-simulation equivalent to H up to k transition steps. As a byproduct, we obtain a proof that a bounded safety verification tool such as dReach will eventually prove the safety of a safe hybrid automaton design (when only non-strict inequalities are used in all constraints) if dReach iteratively reduces the syntactic parameter delta that is used in checking approximate satisfiability. This has an immediate application in counter-example validation in a CEGAR framework, namely, when a counter-example is spurious, then we have a complete procedure for deducing the same.

Cite as

Nima Roohi, Pavithra Prabhakar, and Mahesh Viswanathan. Relating Syntactic and Semantic Perturbations of Hybrid Automata. In 29th International Conference on Concurrency Theory (CONCUR 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 118, pp. 26:1-26:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{roohi_et_al:LIPIcs.CONCUR.2018.26,
  author =	{Roohi, Nima and Prabhakar, Pavithra and Viswanathan, Mahesh},
  title =	{{Relating Syntactic and Semantic Perturbations of Hybrid Automata}},
  booktitle =	{29th International Conference on Concurrency Theory (CONCUR 2018)},
  pages =	{26:1--26:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-087-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{118},
  editor =	{Schewe, Sven and Zhang, Lijun},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2018.26},
  URN =		{urn:nbn:de:0030-drops-95644},
  doi =		{10.4230/LIPIcs.CONCUR.2018.26},
  annote =	{Keywords: Model Checking, Hybrid Automata, Approximation, Perturbation}
}
  • Refine by Author
  • 3 König, Barbara
  • 3 Rahmann, Sven
  • 2 Narayan Kumar, K.
  • 2 Saivasan, Prakash
  • 2 Zentgraf, Jens
  • Show More...

  • Refine by Classification
  • 5 Applied computing → Bioinformatics
  • 3 Applied computing → Molecular sequence analysis
  • 3 Theory of computation → Data structures design and analysis
  • 3 Theory of computation → Logic and verification
  • 2 Applied computing → Computational biology
  • Show More...

  • Refine by Keyword
  • 2 Cuckoo hashing
  • 2 Petri nets
  • 2 behavioural metrics
  • 2 k-mer
  • 2 k-mers
  • Show More...

  • Refine by Type
  • 25 document

  • Refine by Publication Year
  • 9 2018
  • 9 2022
  • 3 2017
  • 1 2012
  • 1 2016
  • Show More...

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail