One-shot Learning of Poisson Distributions in fast changing environments

Author Peter Tino



PDF
Thumbnail PDF

File

DagSemProc.10302.4.pdf
  • Filesize: 418 kB
  • 9 pages

Document Identifiers

Author Details

Peter Tino

Cite AsGet BibTex

Peter Tino. One-shot Learning of Poisson Distributions in fast changing environments. In Learning paradigms in dynamic environments. Dagstuhl Seminar Proceedings, Volume 10302, pp. 1-9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2010)
https://doi.org/10.4230/DagSemProc.10302.4

Abstract

In Bioinformatics, Audic and Claverie were among the first to systematically study the influence of random fluctuations and sampling size on the reliability of digital expression profile data. For a transcript representing a small fraction of the library and a large number N of clones, the probability of observing x tags of the same gene will be well-approximated by the Poisson distribution parametrised by its mean (and variance) m>0, where the unknown parameter m signifies the number of transcripts of the given type (tag) per N clones in the cDNA library. On an abstract level, to determine whether a gene is differentially expressed or not, one has two numbers generated from two distinct Poisson distributions and based on this (extremely sparse) sample one has to decide whether the two Poisson distributions are identical or not. This can be used e.g. to determine equivalence of Poisson photon sources (up to time shift) in gravitational lensing. Each Poisson distribution is represented by a single measurement only, which is, of course, from a purely statistical standpoint very problematic. The key instrument of the Audic-Claverie approach is a distribution P over tag counts y in one library informed by the tag count x in the other library, under the null hypothesis that the tag counts are generated from the same but unknown Poisson distribution. P is obtained by Bayesian averaging (infinite mixture) of all possible Poisson distributions with mixing proportions equal to the posteriors (given x) under the flat prior over m. We ask: Given that the tag count samples from SAGE libraries are *extremely* limited, how useful actually is the Audic-Claverie methodology? We rigorously analyse the A-C statistic P that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation. We show will that the A-C statistic P and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the K-L divergence from the true unknown Poisson distribution to the A-C statistic is minimised when the A-C statistic is conditioned on the mode of the Poisson distribution. Most importantly (and perhaps rather surprisingly), the expectation of this K-L divergence never exceeds 1/2 bit! This constitutes a rigorous quantitative argument, extending the previous empirical Monte Carlo studies, that supports the wide spread use of Audic-Claverie method, even though by their very nature, the SAGE libraries represent very sparse samples.
Keywords
  • Audic-Claverie statistic
  • Bayesian averaging
  • information theory
  • one-shot learning
  • Poisson distribution

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail