DROPS

Document

Structure Discovery in Biology: Motifs, Networks & Phylogenies (Dagstuhl Seminar 12291)

Authors: Alberto Apostolico, Andreas Dress, and Laxmi Parida

Published in: Dagstuhl Reports, Volume 2, Issue 7 (2013)

Abstract

From 15.07.12 to 20.07.12, the Dagstuhl Seminar 12291 "Structure Discovery in Biology: Motifs, Networks & Phylogenies" was held in Schloss Dagstuhl -- Leibniz Center for Informatics. The seminar was in part a follow-up to Dagstuhl Seminar 10231, held in June 2010, this time with a strong emphasis on large data. Both veterans and new participants took part in this edition. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar, as well as abstracts of seminar results and ideas, are put together in this report. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.

Cite as

Alberto Apostolico, Andreas Dress, and Laxmi Parida. Structure Discovery in Biology: Motifs, Networks & Phylogenies (Dagstuhl Seminar 12291). In Dagstuhl Reports, Volume 2, Issue 7, pp. 92-117, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2012)

Copy BibTex To Clipboard

@Article{apostolico_et_al:DagRep.2.7.92,
  author =	{Apostolico, Alberto and Dress, Andreas and Parida, Laxmi},
  title =	{{Structure Discovery in Biology: Motifs, Networks \& Phylogenies (Dagstuhl Seminar 12291)}},
  pages =	{92--117},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2012},
  volume =	{2},
  number =	{7},
  editor =	{Apostolico, Alberto and Dress, Andreas and Parida, Laxmi},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagRep.2.7.92},
  URN =		{urn:nbn:de:0030-drops-37509},
  doi =		{10.4230/DagRep.2.7.92},
  annote =	{Keywords: mathematical biology, computational biology, algorithmic bioinformatics, pattern discovery, networks, phylogenetics, stringology}
}

Document

DOI: 10.4230/DagSemProc.10231.7

Remote Homology Detection of Protein Sequences

Authors: Matteo Comin and Davide Verzotto

Published in: Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)

Abstract

The classification of protein sequences using string kernels provides valuable insights for protein function prediction. Almost all string kernels are based on patterns that are not independent, and therefore the associated scores are obtained using a set of redundant features. In this talk we will discuss how a class of patterns, called Irredundant, is specifically designed to address this issue. Loosely speaking the set of Irredundant patterns is the smallest class of independent patterns that can describe all patterns in a string. We present a classification method based on the statistics of these patterns, named Irredundant Class. Results on benchmark data show that Irredundant Class outperforms most of the string kernel methods previously proposed, and it achieves results as good as the current state-of-the-art methods with a fewer number of patterns. Unfortunately we show that the information carried by the irredundant patterns can not be easily interpreted, thus alternative notions are needed.

Cite as

Matteo Comin and Davide Verzotto. Remote Homology Detection of Protein Sequences. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{comin_et_al:DagSemProc.10231.7,
  author =	{Comin, Matteo and Verzotto, Davide},
  title =	{{Remote Homology Detection of Protein Sequences}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--20},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.7},
  URN =		{urn:nbn:de:0030-drops-27419},
  doi =		{10.4230/DagSemProc.10231.7},
  annote =	{Keywords: Classification of protein sequences, irredundant patterns}
}

Document

DOI: 10.4230/DagSemProc.10231.1

10231 Abstracts Collection – Structure Discovery in Biology: Motifs, Networks & Phylogenies

Authors: Alberto Apostolico, Andreas Dress, and Laxmi Parida

Published in: Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)

Abstract

From 06.06. to 11.06.2010, the Dagstuhl Seminar 10231 ``Structure Discovery in Biology: Motifs, Networks & Phylogenies '' was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.

Cite as

Alberto Apostolico, Andreas Dress, and Laxmi Parida. 10231 Abstracts Collection – Structure Discovery in Biology: Motifs, Networks & Phylogenies. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{apostolico_et_al:DagSemProc.10231.1,
  author =	{Apostolico, Alberto and Dress, Andreas and Parida, Laxmi},
  title =	{{10231 Abstracts Collection – Structure Discovery in Biology: Motifs, Networks \& Phylogenies}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--20},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.1},
  URN =		{urn:nbn:de:0030-drops-26910},
  doi =		{10.4230/DagSemProc.10231.1},
  annote =	{Keywords: Mathematical biology, computational biology, algorithmic bioinformatics, pattern discovery, phylogenetics, networks}
}

Document

DOI: 10.4230/DagSemProc.10231.2

A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance

Authors: Anne Bergeron, Julia Mixtacki, and Jens Stoye

Published in: Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)

Abstract

The genomic distance problem in the Hannenhalli-Pevzner (HP) theory is the following: Given two genomes whose chromosomes are linear, calculate the minimum number of translocations, fusions, fissions and inversions that transform one genome into the other. We will present a new distance formula based on a simple tree structure that captures all the delicate features of this problem in a unifying way, and a linear-time algorithm for computing this distance.

Cite as

Anne Bergeron, Julia Mixtacki, and Jens Stoye. A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-25, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{bergeron_et_al:DagSemProc.10231.2,
  author =	{Bergeron, Anne and Mixtacki, Julia and Stoye, Jens},
  title =	{{A New Linear Time Algorithm to Compute the Genomic Distance Via the Double Cut and Join Distance}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--25},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.2},
  URN =		{urn:nbn:de:0030-drops-26892},
  doi =		{10.4230/DagSemProc.10231.2},
  annote =	{Keywords: Comparative genomics, genomic distance computation, HP theory}
}

Document

DOI: 10.4230/DagSemProc.10231.3

A New Tree Distance Metric for Structural Comparison of Sequences

Authors: Matthias Gallé

Published in: Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)

Abstract

In this paper we consider structural comparison of sequences, that is, to compare sequences not by their content but by their structure. We focus on the case where this structure can be defined by a tree and propose a new tree distance metric that capture structural similarity. This metric satisfies non-negativity, identity, symmetry and the triangle inequality. We give algorithms to compute this metric and validate it by using it as a distance function for a clustering process of slightly modified copies of trees, outperforming an existing measure.

Cite as

Matthias Gallé. A New Tree Distance Metric for Structural Comparison of Sequences. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{galle:DagSemProc.10231.3,
  author =	{Gall\'{e}, Matthias},
  title =	{{A New Tree Distance Metric for Structural Comparison of Sequences}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--9},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.3},
  URN =		{urn:nbn:de:0030-drops-27375},
  doi =		{10.4230/DagSemProc.10231.3},
  annote =	{Keywords: Tree distance, structure discovery, Parseval metric, Tanimoto distance}
}

Document

DOI: 10.4230/DagSemProc.10231.4

Efficient computation of statistics for words with mismatches

Authors: Cinzia Pizzi

Published in: Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)

Abstract

Since early stages of bioinformatics, substrings played a crucial role in the search and discovery of significant biological signals. Despite the advent of a large number of different approaches and models toaccomplish these tasks, substrings continue to be widely used to determine statistical distributions and compositions of biological sequences at various levels of details. Here we overview efficient algorithms that were recently proposed to compute the actual and the expected frequency for words with k mismatches, when it is assumed that the words of interest occur at least once exactly in the sequence under analysis. Efficiency means these algorithms are polynomial in k rather than exponential as with an enumerative approach, and independent on the length of the query word. These algorithms are all based on a common incremental approach of a preprocessing step that allows to answer queries related to any word occurring in the text efficiently. The same approach can be used with a sliding window scanning of the sequence to compute the same statistics for words of fixed lengths, even more efficiently. The efficient computation of both expected and actual frequency of sub- strings, combined with a study on the monotonicity of popular scores such as z-scores, allows to build tables of feasible size in reasonable time, and can therefore be used in practical applications.

Cite as

Cinzia Pizzi. Efficient computation of statistics for words with mismatches. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{pizzi:DagSemProc.10231.4,
  author =	{Pizzi, Cinzia},
  title =	{{Efficient computation of statistics for words with mismatches}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--22},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.4},
  URN =		{urn:nbn:de:0030-drops-27384},
  doi =		{10.4230/DagSemProc.10231.4},
  annote =	{Keywords: Statistics on words, mismatches, dynamic programming, biological sequences.}
}

Document

DOI: 10.4230/DagSemProc.10231.6

Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures

Authors: Raffaele Giancarlo, Davide Corona, Valeria Di Benedetto, Alessandra Gabriele, and Filippo Utro

Published in: Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)

Abstract

In the quest for a mathematical measure able to capture and shed light on the dual notions of information and complexity in biosequences, Hazen et al. have introduced the notion of Functional Information (FI for short). It is also the result of earlier considerations and findings by Szostak and Carothers et al. Based on the experiments by Charoters et al., regarding FI in RNA binding activities, we decided to study the relation existing between FI and classic measures of complexity applied on protein-DNA interactions on a genome-wide scale. Using classic complexity measures, i.e, Shannon entropy and Kolmogorov Complexity as both estimated by data compression, we found that FI applied to protein-DNA interactions is genuinely different from them. Such a fact, together with the non-triviality of the biological function considered, contributes to the establishment of FI as a novel and useful measure of biocomplexity. Remarkably, we also found a relationship, on a genome-wide scale, between the redundancy of a genomic region and its ability to interact with a protein. This latter finding justifies even more some principles for the design of motif discovery algorithms. Finally, our experiments bring to light methodological limitations of Linguistic Complexity measures, i.e., a class of measures that is a function of the vocabulary richness of a sequence. Indeed, due to the technology and associated statistical preprocessing procedures used to conduct our studies, i.e., genome-wide ChIP-chip experiments, that class of measures cannot give any statistically significant indication about complexity and function. A serious limitation due to the widespread use of the technology. References J.M. Carothers, S.C. Oestreich, J.H. Davis, and J.W. Szostack. Informational complexity and functional activity of RNA structures. J. AM. CHEM. SOC., 126 (2004), pp. 5130-5137. R.M. Hazen, P.L. Griffin, J.M. Carothers, and J.W. Szostak. Functional Information and the emergence of biocomplexity. Proc. of Nat. Acad. Sci, 104 (2007), pp. 8574-8581. J.W. Szostak. Functional Information: molecular messages, Nature, 423 (2003).

Cite as

Raffaele Giancarlo, Davide Corona, Valeria Di Benedetto, Alessandra Gabriele, and Filippo Utro. Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{giancarlo_et_al:DagSemProc.10231.6,
  author =	{Giancarlo, Raffaele and Corona, Davide and Di Benedetto, Valeria and Gabriele, Alessandra and Utro, Filippo},
  title =	{{Functional Information, Biomolecular Messages and Complexity of BioSequences and Structures}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--13},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.6},
  URN =		{urn:nbn:de:0030-drops-26884},
  doi =		{10.4230/DagSemProc.10231.6},
  annote =	{Keywords: Functional activity, sequence complexity, combinatorics on words, protein-DNA interaction.}
}

Document

DOI: 10.4230/DagSemProc.10231.8

The Ideal Storage Cellular Automaton Model

Authors: Andreas Dress, Wim Hordijk, Lin Wei, and Peter Serocka

Published in: Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)

Abstract

We have implemented and investigated a spatial extension of the orig- inal ideal storage model by embedding it in a 2D cellular automaton with a diffusion-like coupling between neighboring cells. The resulting ideal storage cellular automaton model (ISCAM) generates many interesting spatio-temporal patterns, in particular spiral waves that grow and com- pete" with each other. We study this dynamical behavior both mathemat- ically and computationally, and compare it with similar patterns observed in actual chemical processes. Remarkably, it turned out that one can use such CA for modeling all sorts of complex processes, from phase transition in binary mixtures to using them as a metaphor for cancer onset caused by only one short pulse of 'tissue dis-organzation' (changing e.g. for only one single time step the diffusion coefficient) as hypothesized in recent papers questioning the current gene/genome centric view on cancer onset by AO Ping et al.

Cite as

Andreas Dress, Wim Hordijk, Lin Wei, and Peter Serocka. The Ideal Storage Cellular Automaton Model. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{dress_et_al:DagSemProc.10231.8,
  author =	{Dress, Andreas and Hordijk, Wim and Wei, Lin and Serocka, Peter},
  title =	{{The Ideal Storage Cellular Automaton Model}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--8},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.8},
  URN =		{urn:nbn:de:0030-drops-27280},
  doi =		{10.4230/DagSemProc.10231.8},
  annote =	{Keywords: }
}

Document

DOI: 10.4230/DagSemProc.10231.5

Estimation of alternative splicing isoform frequencies from RNA-Seq data

Authors: Marius Nicolae, Serghei Mangul, Ion Mandoiu, and Alex Zelikovsky

Published in: Dagstuhl Seminar Proceedings, Volume 10231, Structure Discovery in Biology: Motifs, Networks & Phylogenies (2010)

Abstract

We present a novel expectation-maximization algorithm for inference of alternative splicing isoform frequencies from high-throughput transcriptome sequencing (RNA-Seq) data. Our algorithm exploits largely ignored disambiguation information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information if available. Empirical experiments on synthetic datasets show that the algorithm significantly outperforms existing methods of isoform and gene expression level estimation from RNA-Seq data.

Cite as

Marius Nicolae, Serghei Mangul, Ion Mandoiu, and Alex Zelikovsky. Estimation of alternative splicing isoform frequencies from RNA-Seq data. In Structure Discovery in Biology: Motifs, Networks & Phylogenies. Dagstuhl Seminar Proceedings, Volume 10231, pp. 1-3, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)

Copy BibTex To Clipboard

@InProceedings{nicolae_et_al:DagSemProc.10231.5,
  author =	{Nicolae, Marius and Mangul, Serghei and Mandoiu, Ion and Zelikovsky, Alex},
  title =	{{Estimation of alternative splicing isoform frequencies from RNA-Seq data}},
  booktitle =	{Structure Discovery in Biology: Motifs, Networks \& Phylogenies},
  pages =	{1--3},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2010},
  volume =	{10231},
  editor =	{Alberto Apostolico and Andreas Dress and Laxmi Parida},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.10231.5},
  URN =		{urn:nbn:de:0030-drops-26876},
  doi =		{10.4230/DagSemProc.10231.5},
  annote =	{Keywords: RNA-Seq, alternative splicing isoforms, expectation maximization}
}

Document

DOI: 10.4230/DagSemProc.06201.3

Local Minimax Learning of Approximately Polynomial Functions

Authors: Lee Jones and Konstantin Rybnikov

Published in: Dagstuhl Seminar Proceedings, Volume 6201, Combinatorial and Algorithmic Foundations of Pattern and Association Discovery (2006)

Abstract

Suppose we have a number of noisy measurements of an unknown real-valued function $f$ near point of interest $mathbf{x}_0 in mathbb{R}^d$. Suppose also that nothing can be assumed about the noise distribution, except for zero mean and bounded covariance matrix. We want to estimate $f$ at $mathbf{x=x}_0$ using a general linear parametric family $f(mathbf{x};mathbf{a}) = a_0 h_0 (mathbf{x}) ++ a_q h_q (mathbf{x})$, where $mathbf{a} in mathbb{R}^q$ and $h_i$'s are bounded functions on a neighborhood $B$ of $mathbf{x}_0$ which contains all points of measurement. Typically, $B$ is a Euclidean ball or cube in $mathbb{R}^d$ (more generally, a ball in an $l_p$-norm). In the case when the $h_i$'s are polynomial functions in $x_1,ldots,x_d$ the model is called locally-polynomial. In particular, if the $h_i$'s form a basis of the linear space of polynomials of degree at most two, the model is called locally-quadratic (if the degree is at most three, the model is locally-cubic, etc.). Often, there is information, which is called context, about the function $f$ (restricted to $B$ ) available, such as that it takes values in a known interval, or that it satisfies a Lipschitz condition. The theory of local minimax estimation with context for locally-polynomial models and approximately locally polynomial models has been recently initiated by Jones. In the case of local linearity and a bound on the change of $f$ on $B$, where $B$ is a ball, the solution for squared error loss is in the form of ridge regression, where the ridge parameter is identified; hence, minimax justification for ridge regression is given together with explicit best error bounds. The analysis of polynomial models of degree above 1 leads to interesting and difficult questions in real algebraic geometry and non-linear optimization. We show that in the case when $f$ is a probability function, the optimal (in the minimax sense) estimator is effectively computable (with any given precision), thanks to Tarski's elimination principle.

Cite as

Lee Jones and Konstantin Rybnikov. Local Minimax Learning of Approximately Polynomial Functions. In Combinatorial and Algorithmic Foundations of Pattern and Association Discovery. Dagstuhl Seminar Proceedings, Volume 6201, pp. 1-12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2007)

Copy BibTex To Clipboard

@InProceedings{jones_et_al:DagSemProc.06201.3,
  author =	{Jones, Lee and Rybnikov, Konstantin},
  title =	{{Local Minimax Learning of Approximately Polynomial Functions}},
  booktitle =	{Combinatorial and Algorithmic Foundations of Pattern and Association Discovery},
  pages =	{1--12},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2007},
  volume =	{6201},
  editor =	{Rudolf Ahlswede and Alberto Apostolico and Vladimir I. Levenshtein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06201.3},
  URN =		{urn:nbn:de:0030-drops-8912},
  doi =		{10.4230/DagSemProc.06201.3},
  annote =	{Keywords: Local learning, statistical learning, estimator, minimax, convex optimization, quantifier elimination, semialgebraic, ridge regression, polynomial}
}

Document

DOI: 10.4230/DagSemProc.06201.7

Solving Classical String Problems an Compressed Texts

Authors: Yury Lifshits

Published in: Dagstuhl Seminar Proceedings, Volume 6201, Combinatorial and Algorithmic Foundations of Pattern and Association Discovery (2006)

Abstract

How to solve string problems, if instead of input string we get only program generating it? Is it possible to solve problems faster than just "generate text + apply classical algorithm"? In this paper we consider strings generated by straight-line programs (SLP). These are programs using only assignment operator. We show new algorithms for equivalence, pattern matching, finding periods and covers, computing fingerprint table on SLP-generated strings. From the other hand, computing the Hamming distance is NP-hard. Main corollary is an $O(n2*m)$ algorithm for pattern matching in LZ-compressed texts.

Cite as

Yury Lifshits. Solving Classical String Problems an Compressed Texts. In Combinatorial and Algorithmic Foundations of Pattern and Association Discovery. Dagstuhl Seminar Proceedings, Volume 6201, pp. 1-10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2006)

Copy BibTex To Clipboard

@InProceedings{lifshits:DagSemProc.06201.7,
  author =	{Lifshits, Yury},
  title =	{{Solving Classical String Problems an Compressed Texts}},
  booktitle =	{Combinatorial and Algorithmic Foundations of Pattern and Association Discovery},
  pages =	{1--10},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2006},
  volume =	{6201},
  editor =	{Rudolf Ahlswede and Alberto Apostolico and Vladimir I. Levenshtein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06201.7},
  URN =		{urn:nbn:de:0030-drops-7984},
  doi =		{10.4230/DagSemProc.06201.7},
  annote =	{Keywords: Pattern matching, Compressed text}
}

Document

DOI: 10.4230/DagSemProc.06201.1

06201 Abstracts Collection – Combinatorial and Algorithmic Foundations of Pattern and Association Discovery

Authors: Rudolf Ahlswede, Alberto Apostolico, and Vladimir I. Levenshtein

Published in: Dagstuhl Seminar Proceedings, Volume 6201, Combinatorial and Algorithmic Foundations of Pattern and Association Discovery (2006)

Abstract

From 15.05.06 to 20.05.06, the Dagstuhl Seminar 06201 ``Combinatorial and Algorithmic Foundations of Pattern and Association Discovery'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.

Cite as

Rudolf Ahlswede, Alberto Apostolico, and Vladimir I. Levenshtein. 06201 Abstracts Collection – Combinatorial and Algorithmic Foundations of Pattern and Association Discovery. In Combinatorial and Algorithmic Foundations of Pattern and Association Discovery. Dagstuhl Seminar Proceedings, Volume 6201, pp. 1-15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2006)

Copy BibTex To Clipboard

@InProceedings{ahlswede_et_al:DagSemProc.06201.1,
  author =	{Ahlswede, Rudolf and Apostolico, Alberto and Levenshtein, Vladimir I.},
  title =	{{06201 Abstracts Collection – Combinatorial and Algorithmic Foundations of Pattern and Association Discovery}},
  booktitle =	{Combinatorial and Algorithmic Foundations of Pattern and Association Discovery},
  pages =	{1--15},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2006},
  volume =	{6201},
  editor =	{Rudolf Ahlswede and Alberto Apostolico and Vladimir I. Levenshtein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06201.1},
  URN =		{urn:nbn:de:0030-drops-7873},
  doi =		{10.4230/DagSemProc.06201.1},
  annote =	{Keywords: Data compression, pattern matching, pattern discovery, search, sorting, molecular biology, reconstruction, genome rearrangements}
}

Document

DOI: 10.4230/DagSemProc.06201.2

06201 Executive Summary – Combinatorial and Algorithmic Foundations of Pattern and Association Discovery

Authors: Rudolf Ahlswede, Alberto Apostolico, and Vladimir I. Levenshtein

Published in: Dagstuhl Seminar Proceedings, Volume 6201, Combinatorial and Algorithmic Foundations of Pattern and Association Discovery (2006)

Abstract

The goals of this seminar have been (1) to identify and match recently developed methods to specific tasks and data sets in a core of application areas; next, based on feedback from the specific applied domain, (2) to fine tune and personalize those applications, and finally (3) to identify and tackle novel combinatorial and algorithmic problems, in some cases all the way to the development of novel software tools.

Cite as

Rudolf Ahlswede, Alberto Apostolico, and Vladimir I. Levenshtein. 06201 Executive Summary – Combinatorial and Algorithmic Foundations of Pattern and Association Discovery. In Combinatorial and Algorithmic Foundations of Pattern and Association Discovery. Dagstuhl Seminar Proceedings, Volume 6201, pp. 1-2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2006)

Copy BibTex To Clipboard

@InProceedings{ahlswede_et_al:DagSemProc.06201.2,
  author =	{Ahlswede, Rudolf and Apostolico, Alberto and Levenshtein, Vladimir I.},
  title =	{{06201 Executive Summary – Combinatorial and Algorithmic Foundations of Pattern and Association Discovery}},
  booktitle =	{Combinatorial and Algorithmic Foundations of Pattern and Association Discovery},
  pages =	{1--2},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2006},
  volume =	{6201},
  editor =	{Rudolf Ahlswede and Alberto Apostolico and Vladimir I. Levenshtein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06201.2},
  URN =		{urn:nbn:de:0030-drops-7926},
  doi =		{10.4230/DagSemProc.06201.2},
  annote =	{Keywords: Data compression, pattern matching, pattern discovery, search, sorting, molecular biology, reconstruction, genome rearrangements}
}

Document

DOI: 10.4230/DagSemProc.06201.4

Non--binary error correcting codes with noiseless feedback, localized errors, or both

Authors: Rudolf Ahlswede, Christian Deppe, and Vladimir Lebedev

Published in: Dagstuhl Seminar Proceedings, Volume 6201, Combinatorial and Algorithmic Foundations of Pattern and Association Discovery (2006)

Abstract

We investigate non--binary error correcting codes with noiseless feedback, localized errors, or both. It turns out that the Hamming bound is a central concept. For block codes with feedback we present here a coding scheme based on an idea of erasions, which we call the {\bf rubber method}. It gives an optimal rate for big error correcting fraction $\tau$ ($>{1\over q}$) and infinitely many points on the Hamming bound for small $\tau$. We also consider variable length codes with all lengths bounded from above by $n$ and the end of a word carries the symbol $\Box$ and is thus recognizable by the decoder. For both, the $\Box$-model with feedback and the $\Box$-model with localized errors, the Hamming bound is the exact capacity curve for $\tau <1/2.$ Somewhat surprisingly, whereas with feedback the capacity curve coincides with the Hamming bound also for $1/2\leq \tau \leq 1$, in this range for localized errors the capacity curve equals 0. Also we give constructions for the models with both, feedback and localized errors.

Cite as

Rudolf Ahlswede, Christian Deppe, and Vladimir Lebedev. Non--binary error correcting codes with noiseless feedback, localized errors, or both. In Combinatorial and Algorithmic Foundations of Pattern and Association Discovery. Dagstuhl Seminar Proceedings, Volume 6201, pp. 1-4, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2006)

Copy BibTex To Clipboard

@InProceedings{ahlswede_et_al:DagSemProc.06201.4,
  author =	{Ahlswede, Rudolf and Deppe, Christian and Lebedev, Vladimir},
  title =	{{Non--binary error correcting codes with noiseless feedback, localized errors, or both}},
  booktitle =	{Combinatorial and Algorithmic Foundations of Pattern and Association Discovery},
  pages =	{1--4},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2006},
  volume =	{6201},
  editor =	{Rudolf Ahlswede and Alberto Apostolico and Vladimir I. Levenshtein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06201.4},
  URN =		{urn:nbn:de:0030-drops-7849},
  doi =		{10.4230/DagSemProc.06201.4},
  annote =	{Keywords: Error-correcting codes, localized errors, feedback, variable length codes}
}

Document

DOI: 10.4230/DagSemProc.06201.5

On the Monotonicity of the String Correction Factor for Words with Mismatches

Authors: Alberto Apostolico and Cinzia Pizzi

Published in: Dagstuhl Seminar Proceedings, Volume 6201, Combinatorial and Algorithmic Foundations of Pattern and Association Discovery (2006)

Abstract

The string correction factor is the term by which the probability of a word $w$ needs to be multiplied in order to account for character changes or ``errors'' occurring in at most $k$ arbitrary positions in that word. The behavior of this factor, as a function of $k$ and of the word length, has implications on the number of candidates that need to be considered and weighted when looking for subwords of a sequence that present unusually recurrent replicas within some bounded number of mismatches. Specifically, it is seen that over intervals of mono- or bi-tonicity for the correction factor, only some of the candidates need be considered. This mitigates the computation and leads to tables of over-represented words that are more compact to represent and inspect. In recent work, expectation and score monotonicity has been established for a number of cases of interest, under {em i.i.d.} probabilistic assumptions. The present paper reviews the cases of bi-tonic behavior for the correction factor, concentrating on the instance in which the question is still open.

Cite as

Alberto Apostolico and Cinzia Pizzi. On the Monotonicity of the String Correction Factor for Words with Mismatches. In Combinatorial and Algorithmic Foundations of Pattern and Association Discovery. Dagstuhl Seminar Proceedings, Volume 6201, pp. 1-9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2006)

Copy BibTex To Clipboard

@InProceedings{apostolico_et_al:DagSemProc.06201.5,
  author =	{Apostolico, Alberto and Pizzi, Cinzia},
  title =	{{On the Monotonicity of the String Correction Factor for Words with Mismatches}},
  booktitle =	{Combinatorial and Algorithmic Foundations of Pattern and Association Discovery},
  pages =	{1--9},
  series =	{Dagstuhl Seminar Proceedings (DagSemProc)},
  ISSN =	{1862-4405},
  year =	{2006},
  volume =	{6201},
  editor =	{Rudolf Ahlswede and Alberto Apostolico and Vladimir I. Levenshtein},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemProc.06201.5},
  URN =		{urn:nbn:de:0030-drops-7899},
  doi =		{10.4230/DagSemProc.06201.5},
  annote =	{Keywords: Pattern discovery, Motif, Over-represented word, Monotone score, Correction Factor}
}

20 Search Results for "Apostolico, Alberto"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message