eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2010-08-04
10231
1
20
10.4230/DagSemProc.10231.1
article
10231 Abstracts Collection – Structure Discovery in Biology: Motifs, Networks & Phylogenies
Apostolico, Alberto
Dress, Andreas
Parida, Laxmi
From 06.06. to 11.06.2010, the Dagstuhl Seminar 10231 ``Structure Discovery in Biology: Motifs, Networks & Phylogenies '' was held in Schloss Dagstuhl~--~Leibniz Center for Informatics.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol10231/DagSemProc.10231.1/DagSemProc.10231.1.pdf
Mathematical biology
computational biology
algorithmic bioinformatics
pattern discovery
phylogenetics
networks
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2010-08-04
10231
1
25
10.4230/DagSemProc.10231.2
article
A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance
Bergeron, Anne
Mixtacki, Julia
Stoye, Jens
The genomic distance problem in the Hannenhalli-Pevzner (HP) theory is the following: Given two genomes whose chromosomes are linear, calculate the minimum number of translocations, fusions, fissions and inversions that transform one genome into the other. We will present a new distance formula based on a simple tree structure that captures all the delicate features of this problem in a unifying way, and a linear-time algorithm for computing this distance.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol10231/DagSemProc.10231.2/DagSemProc.10231.2.pdf
Comparative genomics
genomic distance computation
HP theory
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2010-08-04
10231
1
9
10.4230/DagSemProc.10231.3
article
A New Tree Distance Metric for Structural Comparison of Sequences
Gallé, Matthias
In this paper we consider structural comparison of sequences, that is,
to compare sequences not by their content but by their structure.
We focus on the case where this structure can be defined by a tree
and propose a new tree distance metric that capture structural similarity.
This metric satisfies non-negativity, identity, symmetry and the triangle
inequality. We give algorithms to compute this metric and validate
it by using it as a distance function for a clustering process
of slightly modified copies of trees, outperforming an existing measure.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol10231/DagSemProc.10231.3/DagSemProc.10231.3.pdf
Tree distance
structure discovery
Parseval metric
Tanimoto distance
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2010-08-04
10231
1
22
10.4230/DagSemProc.10231.4
article
Efficient computation of statistics for words with mismatches
Pizzi, Cinzia
Since early stages of bioinformatics, substrings played a crucial role in the search and discovery of significant biological signals. Despite the advent of a large number of different approaches and models toaccomplish these tasks, substrings continue to be widely used to determine statistical distributions and compositions of biological sequences at various levels of details.
Here we overview efficient algorithms that were recently proposed to
compute the actual and the expected frequency for words with k mismatches, when it is assumed that the words of interest occur at least once exactly in the sequence under analysis. Efficiency means these algorithms are polynomial in k rather than exponential as with an enumerative approach, and independent on the length of the query word.
These algorithms are all based on a common incremental approach of
a preprocessing step that allows to answer queries related to any word
occurring in the text efficiently. The same approach can be used with a
sliding window scanning of the sequence to compute the same statistics
for words of fixed lengths, even more efficiently.
The efficient computation of both expected and actual frequency of sub-
strings, combined with a study on the monotonicity of popular scores
such as z-scores, allows to build tables of feasible size in reasonable time,
and can therefore be used in practical applications.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol10231/DagSemProc.10231.4/DagSemProc.10231.4.pdf
Statistics on words
mismatches
dynamic programming
biological sequences.
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2010-08-04
10231
1
3
10.4230/DagSemProc.10231.5
article
Estimation of alternative splicing isoform frequencies from RNA-Seq data
Nicolae, Marius
Mangul, Serghei
Mandoiu, Ion
Zelikovsky, Alex
We present a novel expectation-maximization algorithm for inference of alternative splicing isoform frequencies from high-throughput transcriptome sequencing (RNA-Seq) data. Our algorithm exploits largely ignored disambiguation information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information if available. Empirical experiments on synthetic datasets show that the algorithm significantly outperforms existing methods of isoform and gene expression level estimation from RNA-Seq data.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol10231/DagSemProc.10231.5/DagSemProc.10231.5.pdf
RNA-Seq
alternative splicing isoforms
expectation maximization
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2010-08-04
10231
1
13
10.4230/DagSemProc.10231.6
article
Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures
Giancarlo, Raffaele
Corona, Davide
Di Benedetto, Valeria
Gabriele, Alessandra
Utro, Filippo
In the quest for a mathematical measure able to capture and shed light on the dual notions of information and complexity in biosequences, Hazen et al. have introduced the notion of Functional Information (FI for short). It is also the result of earlier considerations and findings by Szostak and Carothers et al. Based on the experiments by Charoters et al., regarding FI in RNA binding activities, we decided to study the relation existing between FI and classic measures of complexity applied on protein-DNA interactions on a genome-wide scale. Using classic complexity measures, i.e, Shannon entropy and Kolmogorov Complexity as both estimated by data compression, we found that FI applied to protein-DNA interactions is genuinely different from them. Such a fact, together with the non-triviality of the biological function considered, contributes to the establishment of FI as a novel and useful measure of biocomplexity. Remarkably, we also found a relationship, on a genome-wide scale, between the redundancy of a genomic region and its ability to interact with a protein. This latter finding justifies even more some principles for the design of motif discovery algorithms. Finally, our experiments bring to light methodological limitations of Linguistic Complexity measures, i.e., a class of measures that is a function of the vocabulary richness of a sequence. Indeed, due to the technology and associated statistical preprocessing procedures used to conduct our studies, i.e., genome-wide ChIP-chip experiments, that class of measures cannot give any statistically significant indication about complexity and function. A serious limitation due to the widespread use of the technology.
References
J.M. Carothers, S.C. Oestreich, J.H. Davis, and J.W. Szostack. Informational complexity and functional activity of RNA structures. J. AM. CHEM. SOC., 126 (2004), pp. 5130-5137.
R.M. Hazen, P.L. Griffin, J.M. Carothers, and J.W. Szostak. Functional Information and the emergence of biocomplexity. Proc. of Nat. Acad. Sci, 104 (2007), pp. 8574-8581.
J.W. Szostak. Functional Information: molecular messages, Nature, 423 (2003).
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol10231/DagSemProc.10231.6/DagSemProc.10231.6.pdf
Functional activity
sequence complexity
combinatorics on words
protein-DNA interaction.
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2010-08-04
10231
1
20
10.4230/DagSemProc.10231.7
article
Remote Homology Detection of Protein Sequences
Comin, Matteo
Verzotto, Davide
The classification of protein sequences using string kernels
provides valuable insights for protein function prediction. Almost
all string kernels are based on patterns that are not independent,
and therefore the associated scores are obtained using a set of
redundant features. In this talk we will discuss how a class of
patterns, called Irredundant, is specifically designed to address
this issue. Loosely speaking the set of Irredundant patterns is the
smallest class of independent patterns that can describe all
patterns in a string. We present a classification method based on
the statistics of these patterns, named Irredundant Class. Results
on benchmark data show that Irredundant Class outperforms most of
the string kernel methods previously proposed, and it achieves
results as good as the current state-of-the-art methods with a fewer
number of patterns. Unfortunately we show that the information
carried by the irredundant patterns can not be easily interpreted,
thus alternative notions are needed.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol10231/DagSemProc.10231.7/DagSemProc.10231.7.pdf
Classification of protein sequences
irredundant patterns
eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Dagstuhl Seminar Proceedings
1862-4405
2010-08-04
10231
1
8
10.4230/DagSemProc.10231.8
article
The Ideal Storage Cellular Automaton Model
Dress, Andreas
Hordijk, Wim
Wei, Lin
Serocka, Peter
We have implemented and investigated a spatial extension of the orig-
inal ideal storage model by embedding it in a 2D cellular automaton with
a diffusion-like coupling between neighboring cells. The resulting ideal
storage cellular automaton model (ISCAM) generates many interesting
spatio-temporal patterns, in particular spiral waves that grow and com-
pete" with each other. We study this dynamical behavior both mathemat-
ically and computationally, and compare it with similar patterns observed
in actual chemical processes. Remarkably, it turned out that one can use
such CA for modeling all sorts of complex processes, from phase transition
in binary mixtures to using them as a metaphor for cancer onset caused
by only one short pulse of 'tissue dis-organzation' (changing e.g. for only
one single time step the diffusion coefficient) as hypothesized in recent
papers questioning the current gene/genome centric view on cancer onset
by AO Ping et al.
https://drops.dagstuhl.de/storage/16dagstuhl-seminar-proceedings/dsp-vol10231/DagSemProc.10231.8/DagSemProc.10231.8.pdf