Dagstuhl Seminar Proceedings, Volume 10231

Document

10231 Abstracts Collection – Structure Discovery in Biology: Motifs, Networks & Phylogenies

Authors: Alberto Apostolico, Andreas Dress, and Laxmi Parida

Abstract

From 06.06. to 11.06.2010, the Dagstuhl Seminar 10231 ``Structure Discovery in Biology: Motifs, Networks & Phylogenies '' was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.

Cite as

Alberto Apostolico, Andreas Dress, and Laxmi Parida. 10231 Abstracts Collection – Structure Discovery in Biology: Motifs, Networks & Phylogenies. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{apostolico_et_al:DagSemProc.10231.1,
  author =	{Apostolico, Alberto and Dress, Andreas and Parida, Laxmi},
  title =	{{10231 Abstracts Collection – Structure Discovery in Biology: Motifs, Networks \& Phylogenies}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--20},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.1},
  URN =		{urn:nbn:de:0030-drops-26910},
  doi =		{10.4230/DagSemProc.10231.1},
  annote =	{Keywords: Mathematical biology, computational biology, algorithmic bioinformatics, pattern discovery, phylogenetics, networks}
}

Document

DOI: 10.4230/DagSemProc.10231.2

A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance

Authors: Anne Bergeron, Julia Mixtacki, and Jens Stoye

Abstract

The genomic distance problem in the Hannenhalli-Pevzner (HP) theory is the following: Given two genomes whose chromosomes are linear, calculate the minimum number of translocations, fusions, fissions and inversions that transform one genome into the other. We will present a new distance formula based on a simple tree structure that captures all the delicate features of this problem in a unifying way, and a linear-time algorithm for computing this distance.

Cite as

Anne Bergeron, Julia Mixtacki, and Jens Stoye. A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{bergeron_et_al:DagSemProc.10231.2,
  author =	{Bergeron, Anne and Mixtacki, Julia and Stoye, Jens},
  title =	{{A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--25},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.2},
  URN =		{urn:nbn:de:0030-drops-26892},
  doi =		{10.4230/DagSemProc.10231.2},
  annote =	{Keywords: Comparative genomics, genomic distance computation, HP theory}
}

Document

DOI: 10.4230/DagSemProc.10231.3

A New Tree Distance Metric for Structural Comparison of Sequences

Authors: Matthias Gallé

Abstract

In this paper we consider structural comparison of sequences, that is, to compare sequences not by their content but by their structure. We focus on the case where this structure can be defined by a tree and propose a new tree distance metric that capture structural similarity. This metric satisfies non-negativity, identity, symmetry and the triangle inequality. We give algorithms to compute this metric and validate it by using it as a distance function for a clustering process of slightly modified copies of trees, outperforming an existing measure.

Cite as

Matthias Gallé. A New Tree Distance Metric for Structural Comparison of Sequences. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{galle:DagSemProc.10231.3,
  author =	{Gall\'{e}, Matthias},
  title =	{{A New Tree Distance Metric for Structural Comparison of Sequences}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--9},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.3},
  URN =		{urn:nbn:de:0030-drops-27375},
  doi =		{10.4230/DagSemProc.10231.3},
  annote =	{Keywords: Tree distance, structure discovery, Parseval metric, Tanimoto distance}
}

Document

DOI: 10.4230/DagSemProc.10231.4

Efficient computation of statistics for words with mismatches

Authors: Cinzia Pizzi

Abstract

Since early stages of bioinformatics, substrings played a crucial role in the search and discovery of significant biological signals. Despite the advent of a large number of different approaches and models toaccomplish these tasks, substrings continue to be widely used to determine statistical distributions and compositions of biological sequences at various levels of details. Here we overview efficient algorithms that were recently proposed to compute the actual and the expected frequency for words with k mismatches, when it is assumed that the words of interest occur at least once exactly in the sequence under analysis. Efficiency means these algorithms are polynomial in k rather than exponential as with an enumerative approach, and independent on the length of the query word. These algorithms are all based on a common incremental approach of a preprocessing step that allows to answer queries related to any word occurring in the text efficiently. The same approach can be used with a sliding window scanning of the sequence to compute the same statistics for words of fixed lengths, even more efficiently. The efficient computation of both expected and actual frequency of sub- strings, combined with a study on the monotonicity of popular scores such as z-scores, allows to build tables of feasible size in reasonable time, and can therefore be used in practical applications.

Cite as

Cinzia Pizzi. Efficient computation of statistics for words with mismatches. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{pizzi:DagSemProc.10231.4,
  author =	{Pizzi, Cinzia},
  title =	{{Efficient computation of statistics for words with mismatches}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--22},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.4},
  URN =		{urn:nbn:de:0030-drops-27384},
  doi =		{10.4230/DagSemProc.10231.4},
  annote =	{Keywords: Statistics on words, mismatches, dynamic programming, biological sequences.}
}

Document

DOI: 10.4230/DagSemProc.10231.5

Estimation of alternative splicing isoform frequencies from RNA-Seq data

Authors: Marius Nicolae, Serghei Mangul, Ion Mandoiu, and Alex Zelikovsky

Abstract

We present a novel expectation-maximization algorithm for inference of alternative splicing isoform frequencies from high-throughput transcriptome sequencing (RNA-Seq) data. Our algorithm exploits largely ignored disambiguation information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information if available. Empirical experiments on synthetic datasets show that the algorithm significantly outperforms existing methods of isoform and gene expression level estimation from RNA-Seq data.

Cite as

Marius Nicolae, Serghei Mangul, Ion Mandoiu, and Alex Zelikovsky. Estimation of alternative splicing isoform frequencies from RNA-Seq data. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{nicolae_et_al:DagSemProc.10231.5,
  author =	{Nicolae, Marius and Mangul, Serghei and Mandoiu, Ion and Zelikovsky, Alex},
  title =	{{Estimation of alternative splicing isoform frequencies from RNA-Seq data}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--3},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.5},
  URN =		{urn:nbn:de:0030-drops-26876},
  doi =		{10.4230/DagSemProc.10231.5},
  annote =	{Keywords: RNA-Seq, alternative splicing isoforms, expectation maximization}
}

Document

DOI: 10.4230/DagSemProc.10231.6

Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures

Authors: Raffaele Giancarlo, Davide Corona, Valeria Di Benedetto, Alessandra Gabriele, and Filippo Utro

Abstract

In the quest for a mathematical measure able to capture and shed light on the dual notions of information and complexity in biosequences, Hazen et al. have introduced the notion of Functional Information (FI for short). It is also the result of earlier considerations and findings by Szostak and Carothers et al. Based on the experiments by Charoters et al., regarding FI in RNA binding activities, we decided to study the relation existing between FI and classic measures of complexity applied on protein-DNA interactions on a genome-wide scale. Using classic complexity measures, i.e, Shannon entropy and Kolmogorov Complexity as both estimated by data compression, we found that FI applied to protein-DNA interactions is genuinely different from them. Such a fact, together with the non-triviality of the biological function considered, contributes to the establishment of FI as a novel and useful measure of biocomplexity. Remarkably, we also found a relationship, on a genome-wide scale, between the redundancy of a genomic region and its ability to interact with a protein. This latter finding justifies even more some principles for the design of motif discovery algorithms. Finally, our experiments bring to light methodological limitations of Linguistic Complexity measures, i.e., a class of measures that is a function of the vocabulary richness of a sequence. Indeed, due to the technology and associated statistical preprocessing procedures used to conduct our studies, i.e., genome-wide ChIP-chip experiments, that class of measures cannot give any statistically significant indication about complexity and function. A serious limitation due to the widespread use of the technology. References J.M. Carothers, S.C. Oestreich, J.H. Davis, and J.W. Szostack. Informational complexity and functional activity of RNA structures. J. AM. CHEM. SOC., 126 (2004), pp. 5130-5137. R.M. Hazen, P.L. Griffin, J.M. Carothers, and J.W. Szostak. Functional Information and the emergence of biocomplexity. Proc. of Nat. Acad. Sci, 104 (2007), pp. 8574-8581. J.W. Szostak. Functional Information: molecular messages, Nature, 423 (2003).

Cite as

Raffaele Giancarlo, Davide Corona, Valeria Di Benedetto, Alessandra Gabriele, and Filippo Utro. Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{giancarlo_et_al:DagSemProc.10231.6,
  author =	{Giancarlo, Raffaele and Corona, Davide and Di Benedetto, Valeria and Gabriele, Alessandra and Utro, Filippo},
  title =	{{Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--13},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.6},
  URN =		{urn:nbn:de:0030-drops-26884},
  doi =		{10.4230/DagSemProc.10231.6},
  annote =	{Keywords: Functional activity, sequence complexity, combinatorics on words, protein-DNA interaction.}
}

Document

DOI: 10.4230/DagSemProc.10231.7

Remote Homology Detection of Protein Sequences

Authors: Matteo Comin and Davide Verzotto

Abstract

The classification of protein sequences using string kernels provides valuable insights for protein function prediction. Almost all string kernels are based on patterns that are not independent, and therefore the associated scores are obtained using a set of redundant features. In this talk we will discuss how a class of patterns, called Irredundant, is specifically designed to address this issue. Loosely speaking the set of Irredundant patterns is the smallest class of independent patterns that can describe all patterns in a string. We present a classification method based on the statistics of these patterns, named Irredundant Class. Results on benchmark data show that Irredundant Class outperforms most of the string kernel methods previously proposed, and it achieves results as good as the current state-of-the-art methods with a fewer number of patterns. Unfortunately we show that the information carried by the irredundant patterns can not be easily interpreted, thus alternative notions are needed.

Cite as

Matteo Comin and Davide Verzotto. Remote Homology Detection of Protein Sequences. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{comin_et_al:DagSemProc.10231.7,
  author =	{Comin, Matteo and Verzotto, Davide},
  title =	{{Remote Homology Detection of Protein Sequences}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--20},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.7},
  URN =		{urn:nbn:de:0030-drops-27419},
  doi =		{10.4230/DagSemProc.10231.7},
  annote =	{Keywords: Classification of protein sequences, irredundant patterns}
}

Document

DOI: 10.4230/DagSemProc.10231.8

The Ideal Storage Cellular Automaton Model

Authors: Andreas Dress, Wim Hordijk, Lin Wei, and Peter Serocka

Abstract

We have implemented and investigated a spatial extension of the orig- inal ideal storage model by embedding it in a 2D cellular automaton with a diffusion-like coupling between neighboring cells. The resulting ideal storage cellular automaton model (ISCAM) generates many interesting spatio-temporal patterns, in particular spiral waves that grow and com- pete" with each other. We study this dynamical behavior both mathemat- ically and computationally, and compare it with similar patterns observed in actual chemical processes. Remarkably, it turned out that one can use such CA for modeling all sorts of complex processes, from phase transition in binary mixtures to using them as a metaphor for cancer onset caused by only one short pulse of 'tissue dis-organzation' (changing e.g. for only one single time step the diffusion coefficient) as hypothesized in recent papers questioning the current gene/genome centric view on cancer onset by AO Ping et al.

Cite as

Andreas Dress, Wim Hordijk, Lin Wei, and Peter Serocka. The Ideal Storage Cellular Automaton Model. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{dress_et_al:DagSemProc.10231.8,
  author =	{Dress, Andreas and Hordijk, Wim and Wei, Lin and Serocka, Peter},
  title =	{{The Ideal Storage Cellular Automaton Model}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--8},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.8},
  URN =		{urn:nbn:de:0030-drops-27280},
  doi =		{10.4230/DagSemProc.10231.8},
  annote =	{Keywords: }
}

Dagstuhl Seminar Proceedings, Volume 10231

Publication Details

Access Numbers

Documents

10231 Abstracts Collection – Structure Discovery in Biology: Motifs, Networks & Phylogenies

Abstract

Cite as

A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance

Abstract

Cite as

A New Tree Distance Metric for Structural Comparison of Sequences

Abstract

Cite as

Efficient computation of statistics for words with mismatches

Abstract

Cite as

Estimation of alternative splicing isoform frequencies from RNA-Seq data

Abstract

Cite as

Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures

Abstract

Cite as

Remote Homology Detection of Protein Sequences

Abstract

Cite as

The Ideal Storage Cellular Automaton Model

Abstract

Cite as

Filters

Thanks for your feedback!

Could not send message