License
when quoting this document, please refer to the following
URN: urn:nbn:de:0030-drops-27998
URL: http://drops.dagstuhl.de/opus/volltexte/2010/2799/

Tino, Peter

One-shot Learning of Poisson Distributions in fast changing environments

pdf-format:
Dokument 1.pdf (418 KB)


Abstract

In Bioinformatics, Audic and Claverie were among the first to systematically study the influence of random fluctuations and sampling size on the reliability of digital expression profile data. For a transcript representing a small fraction of the library and a large number N of clones, the probability of observing x tags of the same gene will be well-approximated by the Poisson distribution parametrised by its mean (and variance) m>0, where the unknown parameter m signifies the number of transcripts of the given type (tag) per N clones in the cDNA library. On an abstract level, to determine whether a gene is differentially expressed or not, one has two numbers generated from two distinct Poisson distributions and based on this (extremely sparse) sample one has to decide whether the two Poisson distributions are identical or not. This can be used e.g. to determine equivalence of Poisson photon sources (up to time shift) in gravitational lensing. Each Poisson distribution is represented by a single measurement only, which is, of course, from a purely statistical standpoint very problematic. The key instrument of the Audic-Claverie approach is a distribution P over tag counts y in one library informed by the tag count x in the other library, under the null hypothesis that the tag counts are generated from the same but unknown Poisson distribution. P is obtained by Bayesian averaging (infinite mixture) of all possible Poisson distributions with mixing proportions equal to the posteriors (given x) under the flat prior over m. We ask: Given that the tag count samples from SAGE libraries are *extremely* limited, how useful actually is the Audic-Claverie methodology? We rigorously analyse the A-C statistic P that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation. We show will that the A-C statistic P and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the K-L divergence from the true unknown Poisson distribution to the A-C statistic is minimised when the A-C statistic is conditioned on the mode of the Poisson distribution. Most importantly (and perhaps rather surprisingly), the expectation of this K-L divergence never exceeds 1/2 bit! This constitutes a rigorous quantitative argument, extending the previous empirical Monte Carlo studies, that supports the wide spread use of Audic-Claverie method, even though by their very nature, the SAGE libraries represent very sparse samples.

BibTeX - Entry

@InProceedings{tino:DSP:2010:2799,
  author =	{Peter Tino},
  title =	{One-shot Learning of Poisson Distributions in fast changing environments},
  booktitle =	{Learning paradigms in dynamic environments},
  year =	{2010},
  editor =	{Barbara Hammer and Pascal Hitzler and Wolfgang Maass and Marc Toussaint},
  number =	{10302},
  series =	{Dagstuhl Seminar Proceedings},
  ISSN =	{1862-4405},
  publisher =	{Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany},
  address =	{Dagstuhl, Germany},
  URL =		{http://drops.dagstuhl.de/opus/volltexte/2010/2799},
  annote =	{Keywords: Audic-Claverie statistic, Bayesian averaging, information theory, one-shot learning, Poisson distribution}
}

Keywords: Audic-Claverie statistic, Bayesian averaging, information theory, one-shot learning, Poisson distribution
Seminar: 10302 - Learning paradigms in dynamic environments
Issue date: 2010
Date of publication: 05.11.2010


DROPS-Home | Fulltext Search | Imprint Published by LZI