Search Results

Documents authored by Rahmann, Sven


Document
Swiftly Identifying Strongly Unique k-Mers

Authors: Jens Zentgraf and Sven Rahmann

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)


Abstract
Motivation. Short DNA sequences of length k that appear in a single location (e.g., at a single genomic position, in a single species from a larger set of species, etc.) are called unique k-mers. They are useful for placing sequenced DNA fragments at the correct location without computing alignments and without ambiguity. However, they are not necessarily robust: A single basepair change may turn a unique k-mer into a different one that may in fact be present at one or more different locations, which may give confusing or contradictory information when attempting to place a read by its k-mer content. A more robust concept are strongly unique k-mers, i.e., unique k-mers for which no Hamming-distance-1 neighbor with conflicting information exists in all of the considered sequences. Given a set of k-mers, it is therefore of interest to have an efficient method that can distinguish k-mers with a Hamming-distance-1 neighbor in the collection from those that do not. Results. We present engineered algorithms to identify and mark within a set K of (canonical) k-mers all elements that have a Hamming-distance-1 neighbor in the same set. One algorithm is based on recursively running a 4-way comparison on sub-intervals of the sorted set. The other algorithm is based on bucketing and running a pairwise bit-parallel Hamming distance test on small buckets of the sorted set. Both methods consider canonical k-mers (i.e., taking reverse complements into account) and allow for efficient parallelization. The methods have been implemented and applied in practice to sets consisting of several billions of k-mers. An optimized combined approach running with 16 threads on a 16-core workstation, yields wall-clock running times below 20 seconds on the 2.5 billion distinct 31-mers of the human telomere-to-telomere reference genome. Availability. An implementation can be found at https://gitlab.com/rahmannlab/strong-k-mers.

Cite as

Jens Zentgraf and Sven Rahmann. Swiftly Identifying Strongly Unique k-Mers. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 15:1-15:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{zentgraf_et_al:LIPIcs.WABI.2024.15,
  author =	{Zentgraf, Jens and Rahmann, Sven},
  title =	{{Swiftly Identifying Strongly Unique k-Mers}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{15:1--15:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.15},
  URN =		{urn:nbn:de:0030-drops-206593},
  doi =		{10.4230/LIPIcs.WABI.2024.15},
  annote =	{Keywords: k-mer, Hamming distance, strong uniqueness, parallelization, algorithm engineering}
}
Document
Complete Volume
LIPIcs, Volume 242, WABI 2022, Complete Volume

Authors: Christina Boucher and Sven Rahmann

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
LIPIcs, Volume 242, WABI 2022, Complete Volume

Cite as

22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 1-474, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@Proceedings{boucher_et_al:LIPIcs.WABI.2022,
  title =	{{LIPIcs, Volume 242, WABI 2022, Complete Volume}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{1--474},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022},
  URN =		{urn:nbn:de:0030-drops-170338},
  doi =		{10.4230/LIPIcs.WABI.2022},
  annote =	{Keywords: LIPIcs, Volume 242, WABI 2022, Complete Volume}
}
Document
Front Matter
Front Matter, Table of Contents, Preface, Conference Organization

Authors: Christina Boucher and Sven Rahmann

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
Front Matter, Table of Contents, Preface, Conference Organization

Cite as

22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 0:i-0:xii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{boucher_et_al:LIPIcs.WABI.2022.0,
  author =	{Boucher, Christina and Rahmann, Sven},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{0:i--0:xii},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.0},
  URN =		{urn:nbn:de:0030-drops-170347},
  doi =		{10.4230/LIPIcs.WABI.2022.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}
Document
Fast Gapped k-mer Counting with Subdivided Multi-Way Bucketed Cuckoo Hash Tables

Authors: Jens Zentgraf and Sven Rahmann

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
Motivation. In biological sequence analysis, alignment-free (also known as k-mer-based) methods are increasingly replacing mapping- and alignment-based methods for various applications. A basic step of such methods consists of building a table of all k-mers of a given set of sequences (a reference genome or a dataset of sequenced reads) and their counts. Over the past years, efficient methods and tools for k-mer counting have been developed. In a different line of work, the use of gapped k-mers has been shown to offer advantages over the use of the standard contiguous k-mers. However, no tool seems to be available that is able to count gapped k-mers with the same efficiency as contiguous k-mers. One reason is that the most efficient k-mer counters use minimizers (of a length m < k) to group k-mers into buckets, such that many consecutive k-mers are classified into the same bucket. This approach leads to cache-friendly (and hence extremely fast) algorithms, but the approach does not transfer easily to gapped k-mers. Consequently, the existing efficient k-mer counters cannot be trivially modified to count gapped k-mers with the same efficiency. Results. We present a different approach that is equally applicable to contiguous k-mers and gapped k-mers. We use multi-way bucketed Cuckoo hash tables to efficiently store (gapped) k-mers and their counts. We also describe a method to parallelize counting over multiple threads without using locks: We subdivide the hash table into independent subtables, and use a producer-consumer model, such that each thread serves one subtable. This requires designing Cuckoo hash functions with the property that all alternative locations for each k-mer are located in the same subtable. Compared to some of the fastest contiguous k-mer counters, our approach is of comparable speed, or even faster, on large datasets, and it is the only one that supports gapped k-mers.

Cite as

Jens Zentgraf and Sven Rahmann. Fast Gapped k-mer Counting with Subdivided Multi-Way Bucketed Cuckoo Hash Tables. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 12:1-12:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{zentgraf_et_al:LIPIcs.WABI.2022.12,
  author =	{Zentgraf, Jens and Rahmann, Sven},
  title =	{{Fast Gapped k-mer Counting with Subdivided Multi-Way Bucketed Cuckoo Hash Tables}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{12:1--12:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.12},
  URN =		{urn:nbn:de:0030-drops-170467},
  doi =		{10.4230/LIPIcs.WABI.2022.12},
  annote =	{Keywords: gapped k-mer, k-mer, counting, Cuckoo hashing, parallelization}
}
Document
Fast Lightweight Accurate Xenograft Sorting

Authors: Jens Zentgraf and Sven Rahmann

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)


Abstract
Motivation: With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species' (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. Results: We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy.

Cite as

Jens Zentgraf and Sven Rahmann. Fast Lightweight Accurate Xenograft Sorting. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 4:1-4:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{zentgraf_et_al:LIPIcs.WABI.2020.4,
  author =	{Zentgraf, Jens and Rahmann, Sven},
  title =	{{Fast Lightweight Accurate Xenograft Sorting}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{4:1--4:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.4},
  URN =		{urn:nbn:de:0030-drops-127933},
  doi =		{10.4230/LIPIcs.WABI.2020.4},
  annote =	{Keywords: xenograft sorting, alignment-free method, Cuckoo hashing, k-mer}
}
Document
Engineering Fused Lasso Solvers on Trees

Authors: Elias Kuthe and Sven Rahmann

Published in: LIPIcs, Volume 160, 18th International Symposium on Experimental Algorithms (SEA 2020)


Abstract
The graph fused lasso optimization problem seeks, for a given input signal y=(y_i) on nodes i∈ V of a graph G=(V,E), a reconstructed signal x=(x_i) that is both element-wise close to y in quadratic error and also has bounded total variation (sum of absolute differences across edges), thereby favoring regionally constant solutions. An important application is denoising of spatially correlated data, especially for medical images. Currently, fused lasso solvers for general graph input reduce the problem to an iteration over a series of "one-dimensional" problems (on paths or line graphs), which can be solved in linear time. Recently, a direct fused lasso algorithm for tree graphs has been presented, but no implementation of it appears to be available. We here present a simplified exact algorithm and additionally a fast approximation scheme for trees, together with engineered implementations for both. We empirically evaluate their performance on different kinds of trees with distinct degree distributions (simulated trees; spanning trees of road networks, grid graphs of images, social networks). The exact algorithm is very efficient on trees with low node degrees, which covers many naturally arising graphs, while the approximation scheme can perform better on trees with several higher-degree nodes when limiting the desired accuracy to values that are useful in practice.

Cite as

Elias Kuthe and Sven Rahmann. Engineering Fused Lasso Solvers on Trees. In 18th International Symposium on Experimental Algorithms (SEA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 160, pp. 23:1-23:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{kuthe_et_al:LIPIcs.SEA.2020.23,
  author =	{Kuthe, Elias and Rahmann, Sven},
  title =	{{Engineering Fused Lasso Solvers on Trees}},
  booktitle =	{18th International Symposium on Experimental Algorithms (SEA 2020)},
  pages =	{23:1--23:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-148-1},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{160},
  editor =	{Faro, Simone and Cantone, Domenico},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2020.23},
  URN =		{urn:nbn:de:0030-drops-120977},
  doi =		{10.4230/LIPIcs.SEA.2020.23},
  annote =	{Keywords: fused lasso, regularization, tree traversal, cache effects}
}
Document
Spalter: A Meta Machine Learning Approach to Distinguish True DNA Variants from Sequencing Artefacts

Authors: Till Hartmann and Sven Rahmann

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Being able to distinguish between true DNA variants and technical sequencing artefacts is a fundamental task in whole genome, exome or targeted gene analysis. Variant calling tools provide diagnostic parameters, such as strand bias or an aggregated overall quality for each called variant, to help users make an informed choice about which variants to accept or discard. Having several such quality indicators poses a problem for the users of variant callers because they need to set or adjust thresholds for each such indicator. Alternatively, machine learning methods can be used to train a classifier based on these indicators. This approach needs large sets of labeled training data, which is not easily available. The new approach presented here relies on the idea that a true DNA variant exists independently of technical features of the read in which it appears (e.g. base quality, strand, position in the read). Therefore the nucleotide separability classification problem - predicting the nucleotide state of each read in a given pileup based on technical features only - should be near impossible to solve for true variants. Nucleotide separability, i.e. achievable classification accuracy, can either be used to distinguish between true variants and technical artefacts directly, using a thresholding approach, or it can be used as a meta-feature to train a separability-based classifier. This article explores both possibilities with promising results, showing accuracies around 90%.

Cite as

Till Hartmann and Sven Rahmann. Spalter: A Meta Machine Learning Approach to Distinguish True DNA Variants from Sequencing Artefacts. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 13:1-13:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{hartmann_et_al:LIPIcs.WABI.2018.13,
  author =	{Hartmann, Till and Rahmann, Sven},
  title =	{{Spalter: A Meta Machine Learning Approach to Distinguish True DNA Variants from Sequencing Artefacts}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{13:1--13:8},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.13},
  URN =		{urn:nbn:de:0030-drops-93158},
  doi =		{10.4230/LIPIcs.WABI.2018.13},
  annote =	{Keywords: variant calling, sequencing error, technical artefact, meta machine learning, classification}
}
Document
Analysis of Min-Hashing for Variant Tolerant DNA Read Mapping

Authors: Jens Quedenfeld and Sven Rahmann

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)


Abstract
DNA read mapping has become a ubiquitous task in bioinformatics. New technologies provide ever longer DNA reads (several thousand basepairs), although at comparatively high error rates (up to 15%), and the reference genome is increasingly not considered as a simple string over ACGT anymore, but as a complex object containing known genetic variants in the population. Conventional indexes based on exact seed matches, in particular the suffix array based FM index, struggle with these changing conditions, so other methods are being considered, and one such alternative is locality sensitive hashing. Here we examine the question whether including single nucleotide polymorphisms (SNPs) in a min-hashing index is beneficial. The answer depends on the population frequency of the SNP, and we analyze several models (from simple to complex) that provide precise answers to this question under various assumptions. Our results also provide sensitivity and specificity values for min-hashing based read mappers and may be used to understand dependencies between the parameters of such methods. We hope that this article will provide a theoretical foundation for a new generation of read mappers.

Cite as

Jens Quedenfeld and Sven Rahmann. Analysis of Min-Hashing for Variant Tolerant DNA Read Mapping. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 21:1-21:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{quedenfeld_et_al:LIPIcs.WABI.2017.21,
  author =	{Quedenfeld, Jens and Rahmann, Sven},
  title =	{{Analysis of Min-Hashing for Variant Tolerant DNA Read Mapping}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{21:1--21:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.21},
  URN =		{urn:nbn:de:0030-drops-76598},
  doi =		{10.4230/LIPIcs.WABI.2017.21},
  annote =	{Keywords: read mapping, min-Hashing, variant, SNP, analysis of algorithms}
}
Document
PanCake: A Data Structure for Pangenomes

Authors: Corinna Ernst and Sven Rahmann

Published in: OASIcs, Volume 34, German Conference on Bioinformatics 2013


Abstract
We present a pangenome data structure ("PanCake") for sets of related genomes, based on bundling similar sequence regions into shared features, which are derived from genome-wide pairwise sequence alignments. We discuss the design of the data structure, basic operations on it and methods to predict core genomes and singleton regions. In contrast to many other pangenome analysis tools, like EDGAR or PGAT, PanCake is independent of gene annotations. Nevertheless, comparison of identified core and singleton regions shows good agreements. The PanCake data structure requires significantly less space than the sum of individual sequence files.

Cite as

Corinna Ernst and Sven Rahmann. PanCake: A Data Structure for Pangenomes. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 35-45, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{ernst_et_al:OASIcs.GCB.2013.35,
  author =	{Ernst, Corinna and Rahmann, Sven},
  title =	{{PanCake: A Data Structure for Pangenomes}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{35--45},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.35},
  URN =		{urn:nbn:de:0030-drops-42314},
  doi =		{10.4230/OASIcs.GCB.2013.35},
  annote =	{Keywords: pangenome, data structure, core genome, comparative genomics}
}
Document
Aligning Flowgrams to DNA Sequences

Authors: Marcel Martin and Sven Rahmann

Published in: OASIcs, Volume 34, German Conference on Bioinformatics 2013


Abstract
A read from 454 or Ion Torrent sequencers is natively represented as a flowgram, which is a sequence of pairs of a nucleotide and its (fractional) intensity. Recent work has focused on improving the accuracy of base calling (conversion of flowgrams to DNA sequences) in order to facilitate read mapping and downstream analysis of sequence variants. However, base calling always incurs a loss of information by discarding fractional intensity information. We argue that base calling can be avoided entirely by directly aligning the flowgrams to DNA sequences. We introduce an algorithm for flowgram-string alignment based on dynamic programming, but covering more cases than standard local or global sequence alignment. We also propose a scoring scheme that takes into account sequence variations (from substitutions, insertions, deletions) and sequencing errors (flow intensities contradicting the homopolymer length) separately. This allows to resolve fractional intensities, ambiguous homopolymer lengths and editing events at alignment time by choosing the most likely read sequence given both the nucleotide intensities and the reference sequence. We provide a proof-of-concept implementation and demonstrate the advantages of flowgram-string alignment compared to base-called alignments.

Cite as

Marcel Martin and Sven Rahmann. Aligning Flowgrams to DNA Sequences. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 125-135, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{martin_et_al:OASIcs.GCB.2013.125,
  author =	{Martin, Marcel and Rahmann, Sven},
  title =	{{Aligning Flowgrams to DNA Sequences}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{125--135},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.125},
  URN =		{urn:nbn:de:0030-drops-42379},
  doi =		{10.4230/OASIcs.GCB.2013.125},
  annote =	{Keywords: flowgram, sequencing, alignment algorithm, scoring scheme}
}
Document
Building and Documenting Workflows with Python-Based Snakemake

Authors: Johannes Köster and Sven Rahmann

Published in: OASIcs, Volume 26, German Conference on Bioinformatics 2012


Abstract
Snakemake is a novel workflow engine with a simple Python-derived workflow definition language and an optimizing execution environment. It is the first system that supports multiple named wildcards (or variables) in input and output filenames of each rule definition. It also allows to write human-readable workflows that document themselves. We have found Snakemake especially useful for building high-throughput sequencing data analysis pipelines and present examples from this area. Snakemake exemplifies a generic way to implement a domain specific language in python, without writing a full parser or introducing syntactical overhead by overloading language features.

Cite as

Johannes Köster and Sven Rahmann. Building and Documenting Workflows with Python-Based Snakemake. In German Conference on Bioinformatics 2012. Open Access Series in Informatics (OASIcs), Volume 26, pp. 49-56, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)


Copy BibTex To Clipboard

@InProceedings{koster_et_al:OASIcs.GCB.2012.49,
  author =	{K\"{o}ster, Johannes and Rahmann, Sven},
  title =	{{Building and Documenting Workflows with Python-Based Snakemake}},
  booktitle =	{German Conference on Bioinformatics 2012},
  pages =	{49--56},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-44-6},
  ISSN =	{2190-6807},
  year =	{2012},
  volume =	{26},
  editor =	{B\"{o}cker, Sebastian and Hufsky, Franziska and Scheubert, Kerstin and Schleicher, Jana and Schuster, Stefan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2012.49},
  URN =		{urn:nbn:de:0030-drops-37179},
  doi =		{10.4230/OASIcs.GCB.2012.49},
  annote =	{Keywords: workflow engine, dependency graph, knapsack problem, Python, high-throughput sequencing, next-generation sequencing}
}
Document
Designing q-Unique DNA Sequences with Integer Linear Programs and Euler Tours in De Bruijn Graphs

Authors: Marianna D'Addario, Nils Kriege, and Sven Rahmann

Published in: OASIcs, Volume 26, German Conference on Bioinformatics 2012


Abstract
DNA nanoarchitechtures require carefully designed oligonucleotides with certain non-hybridization guarantees, which can be formalized as the q-uniqueness property on the sequence level. We study the optimization problem of finding a longest q-unique DNA sequence. We first present a convenient formulation as an integer linear program on the underlying De Bruijn graph that allows to flexibly incorporate a variety of constraints; solution times for practically relevant values of q are short. We then provide additional insights into the problem structure using the quotient graph of the De Bruijn graph with respect to the equivalence relation induced by reverse complementarity. Specifically, for odd q the quotient graph is Eulerian, so finding a longest q-unique sequence is equivalent to finding an Euler tour and solved in linear time with respect to the output string length. For even q, self-complementary edges complicate the problem, and the graph has to be Eulerized by deleting a minimum number of edges. Two sub-cases arise, for one of which we present a complete solution, while the other one remains open.

Cite as

Marianna D'Addario, Nils Kriege, and Sven Rahmann. Designing q-Unique DNA Sequences with Integer Linear Programs and Euler Tours in De Bruijn Graphs. In German Conference on Bioinformatics 2012. Open Access Series in Informatics (OASIcs), Volume 26, pp. 82-92, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)


Copy BibTex To Clipboard

@InProceedings{daddario_et_al:OASIcs.GCB.2012.82,
  author =	{D'Addario, Marianna and Kriege, Nils and Rahmann, Sven},
  title =	{{Designing q-Unique DNA Sequences with Integer Linear Programs and Euler Tours in De Bruijn Graphs}},
  booktitle =	{German Conference on Bioinformatics 2012},
  pages =	{82--92},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-44-6},
  ISSN =	{2190-6807},
  year =	{2012},
  volume =	{26},
  editor =	{B\"{o}cker, Sebastian and Hufsky, Franziska and Scheubert, Kerstin and Schleicher, Jana and Schuster, Stefan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2012.82},
  URN =		{urn:nbn:de:0030-drops-37200},
  doi =		{10.4230/OASIcs.GCB.2012.82},
  annote =	{Keywords: DNA sequence design, De Bruijn graph, quotient graph, reverse complement, Euler graph, Euler tour}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail