Abstract
In Bioinformatics, Audic and Claverie were among the first to systematically study the influence of random fluctuations and sampling size on the reliability of digital expression profile data.
For a transcript representing a small fraction of the library and a large number N of clones, the probability of observing x tags of the same gene will be wellapproximated by the Poisson distribution parametrised by its mean (and variance) m>0,
where the unknown parameter m signifies the number of transcripts of the given type (tag) per N clones in the cDNA library.
On an abstract level, to determine whether a gene is differentially expressed or not, one has two numbers generated from two distinct Poisson distributions and based on this (extremely sparse) sample one has to decide whether the two Poisson distributions are identical or not. This can be used e.g. to determine equivalence of Poisson photon sources (up to time shift) in gravitational lensing.
Each Poisson distribution is represented by a single measurement only, which is, of course, from a purely statistical standpoint very problematic.
The key instrument of the AudicClaverie approach is a distribution P over tag counts y in one library informed by the tag count x in the other library, under the null hypothesis that the tag counts are generated from the same but unknown Poisson distribution. P is obtained by Bayesian averaging (infinite mixture) of all possible Poisson distributions with mixing proportions equal to the posteriors (given x) under the flat prior over m.
We ask: Given that the tag count samples from SAGE libraries are *extremely* limited, how useful actually is the AudicClaverie methodology? We rigorously analyse the AC statistic P that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation.
We show will that the AC statistic P and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the KL divergence from the true unknown Poisson distribution to the AC statistic is minimised when the AC statistic is conditioned on the mode of the Poisson distribution. Most importantly (and perhaps rather surprisingly), the expectation of this KL divergence never exceeds 1/2 bit! This constitutes a rigorous quantitative argument, extending the previous empirical Monte Carlo studies, that supports the wide spread use of AudicClaverie method, even though by their very nature, the SAGE libraries represent very sparse samples.
BibTeX  Entry
@InProceedings{tino:DSP:2010:2799,
author = {Peter Tino},
title = {Oneshot Learning of Poisson Distributions in fast changing environments},
booktitle = {Learning paradigms in dynamic environments},
year = {2010},
editor = {Barbara Hammer and Pascal Hitzler and Wolfgang Maass and Marc Toussaint},
number = {10302},
series = {Dagstuhl Seminar Proceedings},
ISSN = {18624405},
publisher = {Schloss Dagstuhl  LeibnizZentrum fuer Informatik, Germany},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2010/2799},
annote = {Keywords: AudicClaverie statistic, Bayesian averaging, information theory, oneshot learning, Poisson distribution}
}
Keywords: 

AudicClaverie statistic, Bayesian averaging, information theory, oneshot learning, Poisson distribution 
Seminar: 

10302  Learning paradigms in dynamic environments 
Issue Date: 

2010 
Date of publication: 

05.11.2010 