OASIcs, Volume 34

German Conference on Bioinformatics 2013



Thumbnail PDF

Event

GCB 2013, September 10-13, 2013, Göttingen, Germany

Editors

Tim Beißbarth
Martin Kollmar
Andreas Leha
Burkhard Morgenstern
Anne-Kathrin Schultz
Stephan Waack
Edgar Wingender

Publication Details

  • published at: 2013-09-09
  • Publisher: Schloss Dagstuhl – Leibniz-Zentrum für Informatik
  • ISBN: 978-3-939897-59-0
  • DBLP: db/conf/gcb/gcb2013

Access Numbers

Documents

No documents found matching your filter selection.
Document
Complete Volume
OASIcs, Volume 34, GCB'13, Complete Volume

Authors: Tim Beißbarth, Martin Kollmar, Andreas Leha, Burkhard Morgenstern, Anne-Kathrin Schultz, Stephan Waack, and Edgar Wingender


Abstract
OASIcs, Volume 34, GCB'13, Complete Volume

Cite as

German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@Proceedings{beibarth_et_al:OASIcs.GCB.2013,
  title =	{{OASIcs, Volume 34, GCB'13, Complete Volume}},
  booktitle =	{German Conference on Bioinformatics 2013},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013},
  URN =		{urn:nbn:de:0030-drops-42563},
  doi =		{10.4230/OASIcs.GCB.2013},
  annote =	{Keywords: Life and Medical Sciences}
}
Document
Front Matter
Frontmatter, Table of Contents, Preface, Conference Organization

Authors: Tim Beißbarth, Martin Kollmar, Andreas Leha, Burkhard Morgenstern, Anne-Kathrin Schultz, Stephan Waack, and Edgar Wingender


Abstract
Frontmatter, Table of Contents, Preface, Conference Organization

Cite as

German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. i-xiii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{beibarth_et_al:OASIcs.GCB.2013.i,
  author =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  title =	{{Frontmatter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{i--xiii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.i},
  URN =		{urn:nbn:de:0030-drops-42265},
  doi =		{10.4230/OASIcs.GCB.2013.i},
  annote =	{Keywords: Frontmatter, Table of Contents, Preface, Conference Organization}
}
Document
On the estimation of metabolic profiles in metagenomics

Authors: Kathrin Petra Aßhauer and Peter Meinicke


Abstract
Metagenomics enables the characterization of the specific metabolic potential of a microbial community. The common approach towards a quantitative representation of this potential is to count the number of metagenomic sequence fragments that can be assigned to metabolic pathways by means of predicted gene functions. The resulting pathway abundances make up the metabolic profile of the metagenome and several different schemes for computing these profiles have been used. So far, none of the existing approaches actually estimates the proportion of sequences that can be assigned to a particular pathway. In most publications of metagenomic studies, the utilized abundance scores lack a clear statistical meaning and usually cannot be compared across different studies. Here, we introduce a mixture model-based approach to the estimation of pathway abundances that provides a basis for statistical interpretation and fast computation of metabolic profiles. Using the KEGG database our results on a large-scale analysis of data from the Human Microbiome Project show a good representation of metabolic differences between different body sites. Further, the results indicate that our mixture model even provides a better representation than the dedicated HUMAnN tool which has been developed for metabolic analysis of human microbiome data.

Cite as

Kathrin Petra Aßhauer and Peter Meinicke. On the estimation of metabolic profiles in metagenomics. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 1-13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{ahauer_et_al:OASIcs.GCB.2013.1,
  author =	{A{\ss}hauer, Kathrin Petra and Meinicke, Peter},
  title =	{{On the estimation of metabolic profiles in metagenomics}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{1--13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.1},
  URN =		{urn:nbn:de:0030-drops-42380},
  doi =		{10.4230/OASIcs.GCB.2013.1},
  annote =	{Keywords: metagenomics, metabolic profiling, taxonomic profiling, abundance estimation, mixture modeling}
}
Document
On Weighting Schemes for Gene Order Analysis

Authors: Matthias Bernt, Nicolas Wieseke, and Martin Middendorf


Abstract
Gene order analysis aims at extracting phylogenetic information from the comparison of the order and orientation of the genes on the genomes of different species. This can be achieved by computing parsimonious rearrangement scenarios, i.e. to determine a sequence of rearrangements events that transforms one given gene order into another such that the sum of weights of the included rearrangement events is minimal. In this sequence only certain types of rearrangements, given by the rearrangement model, are admissible and weights are assigned with respect to the rearrangement type. The choice of a suitable rearrangement model and corresponding weights for the included rearrangement types is important for the meaningful reconstruction. So far the analysis of weighting schemes for gene order analysis has not been considered sufficiently. In this paper weighting schemes for gene order analysis are considered for two rearrangement models: 1) inversions, transpositions, and inverse transpositions; 2) inversions, block interchanges, and inverse transpositions. For both rearrangement models we determined properties of the weighting functions that exclude certain types of rearrangements from parsimonious rearrangement scenarios.

Cite as

Matthias Bernt, Nicolas Wieseke, and Martin Middendorf. On Weighting Schemes for Gene Order Analysis. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 14-23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{bernt_et_al:OASIcs.GCB.2013.14,
  author =	{Bernt, Matthias and Wieseke, Nicolas and Middendorf, Martin},
  title =	{{On Weighting Schemes for Gene Order Analysis}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{14--23},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.14},
  URN =		{urn:nbn:de:0030-drops-42354},
  doi =		{10.4230/OASIcs.GCB.2013.14},
  annote =	{Keywords: Gene order analysis, maximum parsimony, weighting}
}
Document
Alignment-free sequence comparison with spaced k-mers

Authors: Marcus Boden, Martin Schöneich, Sebastian Horwege, Sebastian Lindner, Chris Leimeister, and Burkhard Morgenstern


Abstract
Alignment-free methods are increasingly used for genome analysis and phylogeny reconstruction since they circumvent various difficulties of traditional approaches that rely on multiple sequence alignments. In particular, they are much faster than alignment-based methods. Most alignment-free approaches work by analyzing the k-mer composition of sequences. In this paper, we propose to use 'spaced k-mers', i.e. patterns of deterministic and 'don't care' positions instead of contiguous k-mers. Using simulated and real-world sequence data, we demonstrate that this approach produces better phylogenetic trees than alignment-free methods that rely on contiguous k-mers. In addition, distances calculated with spaced k-mers appear to be statistically more stable than distances based on contiguous k-mers.

Cite as

Marcus Boden, Martin Schöneich, Sebastian Horwege, Sebastian Lindner, Chris Leimeister, and Burkhard Morgenstern. Alignment-free sequence comparison with spaced k-mers. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 24-34, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{boden_et_al:OASIcs.GCB.2013.24,
  author =	{Boden, Marcus and Sch\"{o}neich, Martin and Horwege, Sebastian and Lindner, Sebastian and Leimeister, Chris and Morgenstern, Burkhard},
  title =	{{Alignment-free sequence comparison with spaced k-mers}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{24--34},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.24},
  URN =		{urn:nbn:de:0030-drops-42334},
  doi =		{10.4230/OASIcs.GCB.2013.24},
  annote =	{Keywords: Alignment-free sequence comparison, phylogeny reconstruction}
}
Document
PanCake: A Data Structure for Pangenomes

Authors: Corinna Ernst and Sven Rahmann


Abstract
We present a pangenome data structure ("PanCake") for sets of related genomes, based on bundling similar sequence regions into shared features, which are derived from genome-wide pairwise sequence alignments. We discuss the design of the data structure, basic operations on it and methods to predict core genomes and singleton regions. In contrast to many other pangenome analysis tools, like EDGAR or PGAT, PanCake is independent of gene annotations. Nevertheless, comparison of identified core and singleton regions shows good agreements. The PanCake data structure requires significantly less space than the sum of individual sequence files.

Cite as

Corinna Ernst and Sven Rahmann. PanCake: A Data Structure for Pangenomes. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 35-45, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{ernst_et_al:OASIcs.GCB.2013.35,
  author =	{Ernst, Corinna and Rahmann, Sven},
  title =	{{PanCake: A Data Structure for Pangenomes}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{35--45},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.35},
  URN =		{urn:nbn:de:0030-drops-42314},
  doi =		{10.4230/OASIcs.GCB.2013.35},
  annote =	{Keywords: pangenome, data structure, core genome, comparative genomics}
}
Document
Reconstructing Consensus Bayesian Network Structures with Application to Learning Molecular Interaction Networks

Authors: Holger Fröhlich and Gunnar W. Klau


Abstract
Bayesian Networks are an established computational approach for data driven network inference. However, experimental data is limited in its availability and corrupted by noise. This leads to an unavoidable uncertainty about the correct network structure. Thus sampling or bootstrap based strategies are applied to obtain edge frequencies. In a more general sense edge frequencies can also result from integrating networks learned on different datasets or via different inference algorithms. Subsequently one typically wants to derive a biological interpretation from the results in terms of a consensus network. We here propose a log odds based edge score on the basis of the expected false positive rate and thus avoid the selection of a subjective edge frequency cutoff. Computing a score optimal consensus network in our new model amounts to solving the maximum weight acyclic subdigraph problem. We use a branch-and-cut algorithm based on integer linear programming for this task. Our empirical studies on simulated and real data demonstrate a consistently improved network reconstruction accuracy compared to two threshold based strategies.

Cite as

Holger Fröhlich and Gunnar W. Klau. Reconstructing Consensus Bayesian Network Structures with Application to Learning Molecular Interaction Networks. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 46-55, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{frohlich_et_al:OASIcs.GCB.2013.46,
  author =	{Fr\"{o}hlich, Holger and Klau, Gunnar W.},
  title =	{{Reconstructing Consensus Bayesian Network Structures with Application to Learning Molecular Interaction Networks}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{46--55},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.46},
  URN =		{urn:nbn:de:0030-drops-42273},
  doi =		{10.4230/OASIcs.GCB.2013.46},
  annote =	{Keywords: Bayesian Networks, Network Reverse Engineering, Minimum Feedback Arc Set, Maximum Acyclic Subgraph, Molecular Interaction Networks}
}
Document
Efficient Interpretation of Tandem Mass Tags in Top-Down Proteomics

Authors: Anna Katharina Hildebrandt, Ernst Althaus, Hans-Peter Lenhof, Chien-Wen Hung, Andreas Tholey, and Andreas Hildebrandt


Abstract
Mass spectrometry is the major analytical tool for the identification and quantification of proteins in biological samples. In so-called top-down proteomics, separation and mass spectrometric analysis is performed at the level of intact proteins, without preparatory digestion steps. It has been shown that the tandem mass tag (TMT) labeling technology, which is often used for quantification based on digested proteins (bottom-up studies), can be applied in top-down proteomics as well. This, however, leads to a complex interpretation problem, where we need to annotate measured peaks with their respective generating protein, the number of charges, and the a priori unknown number of TMT-groups attached to this protein. In this work, we give an algorithm for the efficient enumeration of all valid annotations that fulfill available experimental constraints. Applying the algorithm to real-world data, we show that the annotation problem can indeed be efficiently solved. However, our experiments also demonstrate that reliable annotation in complex mixtures requires at least partial sequence information and high mass accuracy and resolution to go beyond the proof-of-concept stage.

Cite as

Anna Katharina Hildebrandt, Ernst Althaus, Hans-Peter Lenhof, Chien-Wen Hung, Andreas Tholey, and Andreas Hildebrandt. Efficient Interpretation of Tandem Mass Tags in Top-Down Proteomics. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 56-67, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{hildebrandt_et_al:OASIcs.GCB.2013.56,
  author =	{Hildebrandt, Anna Katharina and Althaus, Ernst and Lenhof, Hans-Peter and Hung, Chien-Wen and Tholey, Andreas and Hildebrandt, Andreas},
  title =	{{Efficient Interpretation of Tandem Mass Tags in Top-Down Proteomics}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{56--67},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.56},
  URN =		{urn:nbn:de:0030-drops-42304},
  doi =		{10.4230/OASIcs.GCB.2013.56},
  annote =	{Keywords: Mass spectrometry, TMT labeling, Top-down Proteomics}
}
Document
GEDEVO: An Evolutionary Graph Edit Distance Algorithm for Biological Network Alignment

Authors: Rashid Ibragimov, Maximilian Malek, Jiong Guo, and Jan Baumbach


Abstract
Introduction: With the so-called OMICS technology the scientific community has generated huge amounts of data that allow us to reconstruct the interplay of all kinds of biological entities. The emerging interaction networks are usually modeled as graphs with thousands of nodes and tens of thousands of edges between them. In addition to sequence alignment, the comparison of biological networks has proven great potential to infer the biological function of proteins and genes. However, the corresponding network alignment problem is computationally hard and theoretically intractable for real world instances. Results: We therefore developed GEDEVO, a novel tool for efficient graph comparison dedicated to real-world size biological networks. Underlying our approach is the so-called Graph Edit Distance (GED) model, where one graph is to be transferred into another one, with a minimal number of (or more general: minimal costs for) edge insertions and deletions. We present a novel evolutionary algorithm aiming to minimize the GED, and we compare our implementation against state of the art tools: SPINAL, GHOST, \CGRAAL, and \MIGRAAL. On a set of protein-protein interaction networks from different organisms we demonstrate that GEDEVO outperforms the current methods. It thus refines the previously suggested alignments based on topological information only. Conclusion: With GEDEVO, we account for the constantly exploding number and size of available biological networks. The software as well as all used data sets are publicly available at http://gedevo.mpi-inf.mpg.de.

Cite as

Rashid Ibragimov, Maximilian Malek, Jiong Guo, and Jan Baumbach. GEDEVO: An Evolutionary Graph Edit Distance Algorithm for Biological Network Alignment. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 68-79, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{ibragimov_et_al:OASIcs.GCB.2013.68,
  author =	{Ibragimov, Rashid and Malek, Maximilian and Guo, Jiong and Baumbach, Jan},
  title =	{{GEDEVO: An Evolutionary Graph Edit Distance Algorithm for Biological Network Alignment}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{68--79},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.68},
  URN =		{urn:nbn:de:0030-drops-42298},
  doi =		{10.4230/OASIcs.GCB.2013.68},
  annote =	{Keywords: Network Alignment, Graph Edit Distance, Evolutionary Algorithm, Protein-Protein Interactions}
}
Document
Dinucleotide distance histograms for fast detection of rRNA in metatranscriptomic sequences

Authors: Heiner Klingenberg, Robin Martinjak, Frank Oliver Glöckner, Rolf Daniel, Thomas Lingner, and Peter Meinicke


Abstract
With the advent of metatranscriptomics it has now become possible to study the dynamics of microbial communities. The analysis of environmental RNA-Seq data implies several challenges for the development of efficient tools in bioinformatics. One of the first steps in the computational analysis of metatranscriptomic sequencing reads requires the separation of rRNA and mRNA fragments to ensure that only protein coding sequences are actually used in a subsequent functional analysis. In the context of the rRNA filtering task it is desirable to have a broad spectrum of different methods in order to find a suitable trade-off between speed and accuracy for a particular dataset. We introduce a machine learning approach for the detection of rRNA in metatranscriptomic sequencing reads that is based on support vector machines in combination with dinucleotide distance histograms for feature representation. The results show that our SVM-based approach is at least one order of magnitude faster than any of the existing tools with only a slight degradation of the detection performance when compared to state-of-the-art alignment-based methods.

Cite as

Heiner Klingenberg, Robin Martinjak, Frank Oliver Glöckner, Rolf Daniel, Thomas Lingner, and Peter Meinicke. Dinucleotide distance histograms for fast detection of rRNA in metatranscriptomic sequences. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 80-89, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{klingenberg_et_al:OASIcs.GCB.2013.80,
  author =	{Klingenberg, Heiner and Martinjak, Robin and Gl\"{o}ckner, Frank Oliver and Daniel, Rolf and Lingner, Thomas and Meinicke, Peter},
  title =	{{Dinucleotide distance histograms for fast detection of rRNA in metatranscriptomic sequences}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{80--89},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.80},
  URN =		{urn:nbn:de:0030-drops-42324},
  doi =		{10.4230/OASIcs.GCB.2013.80},
  annote =	{Keywords: Metatranscriptomics, metagenomics, rRNA detection, distance histograms}
}
Document
Utilization of ordinal response structures in classification with high-dimensional expression data

Authors: Andreas Leha, Klaus Jung, and Tim Beißbarth


Abstract
Molecular diagnosis or prediction of clinical treatment outcome based on high-throughput genomics data is a modern application of machine learning techniques for clinical problems. In practice, clinical parameters, such as patient health status or toxic reaction to therapy, are often measured on an ordinal scale (e.g. good, fair, poor). Commonly, the prediction of ordinal end-points is treated as a multi-class classification problem, disregarding the ordering information contained in the response. This may result in a loss of prediction accuracy. Classical approaches to model ordinal response directly, including for instance the cumulative logit model, are typically not applicable to high-dimensional data. We present hierarchical twoing (hi2), a novel algorithm for classification of high-dimensional data into ordered categories. hi2 combines the power of well-understood binary classification with ordinal response prediction. A comparison of several approaches for ordinal classification on real world data as well as simulated data shows that classification algorithms especially designed to handle ordered categories fail to improve upon state-of-the-art non-ordinal classification algorithms. In general, the classification performance of an algorithm is dominated by its ability to deal with the high-dimensionality of the data. Only hi2 outperforms its competitors in the case of moderate effects.

Cite as

Andreas Leha, Klaus Jung, and Tim Beißbarth. Utilization of ordinal response structures in classification with high-dimensional expression data. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 90-100, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{leha_et_al:OASIcs.GCB.2013.90,
  author =	{Leha, Andreas and Jung, Klaus and Bei{\ss}barth, Tim},
  title =	{{Utilization of ordinal response structures in classification with high-dimensional expression data}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{90--100},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.90},
  URN =		{urn:nbn:de:0030-drops-42340},
  doi =		{10.4230/OASIcs.GCB.2013.90},
  annote =	{Keywords: Classification, High-Dimensional Data, Ordinal Response, Expression Data}
}
Document
Extended Sunflower Hidden Markov Models for the recognition of homotypic cis-regulatory modules}

Authors: Ioana M. Lemnian, Ralf Eggeling, and Ivo Grosse


Abstract
The transcription of genes is often regulated not only by transcription factors binding at single sites per promoter, but by the interplay of multiple copies of one or more transcription factors binding at multiple sites forming a cis-regulatory module. The computational recognition of cis-regulatory modules from ChIP-seq or other high-throughput data is crucial in modern life and medical sciences. A common type of cis-regulatory modules are homotypic clusters of binding sites, i.e., clusters of binding sites of one transcription factor. For their recognition the homotypic Sunflower Hidden Markov Model is a promising statistical model. However, this model neglects statistical dependences among nucleotides within binding sites and flanking regions, which makes it not well suited for de-novo motif discovery. Here, we propose an extension of this model that allows statistical dependences within binding sites, their reverse complements, and flanking regions. We study the efficacy of this extended homotypic Sunflower Hidden Markov Model based on ChIP-seq data from the Human ENCODE Project and find that it often outperforms the traditional homotypic Sunflower Hidden Markov Model.

Cite as

Ioana M. Lemnian, Ralf Eggeling, and Ivo Grosse. Extended Sunflower Hidden Markov Models for the recognition of homotypic cis-regulatory modules}. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 101-109, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{lemnian_et_al:OASIcs.GCB.2013.101,
  author =	{Lemnian, Ioana M. and Eggeling, Ralf and Grosse, Ivo},
  title =	{{Extended Sunflower Hidden Markov Models for the recognition of homotypic cis-regulatory modules\}}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{101--109},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.101},
  URN =		{urn:nbn:de:0030-drops-42361},
  doi =		{10.4230/OASIcs.GCB.2013.101},
  annote =	{Keywords: Hidden Markov Models, cis-regulatory modules, de-novo motif discovery}
}
Document
Avoiding Ambiguity and Assessing Uniqueness in Minisatellite Alignment

Authors: Benedikt Löwes and Robert Giegerich


Abstract
Several algorithms have been suggested for minisatellite alignment. Their time complexity is high -- close to O(n^3) -- due to the necessary reconstruction of duplication histories. We investigate the uniqueness of optimal alignments computed under the common single-copy duplication model. To this extent, it is necessary to avoid ambiguity in the algorithm employed. We re-code the ARLEM algorithm in the form of a grammar, and apply a disambiguation technique which uses a mapping to a canonical representation of minisatellite alignments. Having arrived at a non-ambiguous algorithm this way, we demonstrate that the underlying model -- independent of the algorithm -- gives rise to an exorbitant number of different, co-optimal alignments when applied to real-world data. We conclude that alignment-free methods should be considered for minisatellite comparison.

Cite as

Benedikt Löwes and Robert Giegerich. Avoiding Ambiguity and Assessing Uniqueness in Minisatellite Alignment. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 110-124, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{lowes_et_al:OASIcs.GCB.2013.110,
  author =	{L\"{o}wes, Benedikt and Giegerich, Robert},
  title =	{{Avoiding Ambiguity and Assessing Uniqueness in Minisatellite Alignment}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{110--124},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.110},
  URN =		{urn:nbn:de:0030-drops-42285},
  doi =		{10.4230/OASIcs.GCB.2013.110},
  annote =	{Keywords: minisatellite alignment, dynamic programming, ambiguity}
}
Document
Aligning Flowgrams to DNA Sequences

Authors: Marcel Martin and Sven Rahmann


Abstract
A read from 454 or Ion Torrent sequencers is natively represented as a flowgram, which is a sequence of pairs of a nucleotide and its (fractional) intensity. Recent work has focused on improving the accuracy of base calling (conversion of flowgrams to DNA sequences) in order to facilitate read mapping and downstream analysis of sequence variants. However, base calling always incurs a loss of information by discarding fractional intensity information. We argue that base calling can be avoided entirely by directly aligning the flowgrams to DNA sequences. We introduce an algorithm for flowgram-string alignment based on dynamic programming, but covering more cases than standard local or global sequence alignment. We also propose a scoring scheme that takes into account sequence variations (from substitutions, insertions, deletions) and sequencing errors (flow intensities contradicting the homopolymer length) separately. This allows to resolve fractional intensities, ambiguous homopolymer lengths and editing events at alignment time by choosing the most likely read sequence given both the nucleotide intensities and the reference sequence. We provide a proof-of-concept implementation and demonstrate the advantages of flowgram-string alignment compared to base-called alignments.

Cite as

Marcel Martin and Sven Rahmann. Aligning Flowgrams to DNA Sequences. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 125-135, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{martin_et_al:OASIcs.GCB.2013.125,
  author =	{Martin, Marcel and Rahmann, Sven},
  title =	{{Aligning Flowgrams to DNA Sequences}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{125--135},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.125},
  URN =		{urn:nbn:de:0030-drops-42379},
  doi =		{10.4230/OASIcs.GCB.2013.125},
  annote =	{Keywords: flowgram, sequencing, alignment algorithm, scoring scheme}
}

Filters


Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail