{"@context":"https:\/\/schema.org\/","@type":"ScholarlyArticle","@id":"#article2815","name":"One-shot Learning of Poisson Distributions in fast changing environments","abstract":"In Bioinformatics, Audic and Claverie were among the first to systematically study the influence of random fluctuations and sampling size on the reliability of digital expression profile data. \r\nFor a transcript representing a small fraction of the library and a large number N of clones, the probability of observing x tags of the same gene will be well-approximated by the Poisson distribution parametrised by its mean (and variance) m>0,\r\nwhere the unknown parameter m signifies the number of transcripts of the given type (tag) per N clones in the cDNA library.\r\n\r\nOn an abstract level, to determine whether a gene is differentially expressed or not, one has two numbers generated from two distinct Poisson distributions and based on this (extremely sparse) sample one has to decide whether the two Poisson distributions are identical or not. This can be used e.g. to determine equivalence of Poisson photon sources (up to time shift) in gravitational lensing.\r\n \r\nEach Poisson distribution is represented by a single measurement only, which is, of course, from a purely statistical standpoint very problematic.\r\nThe key instrument of the Audic-Claverie approach is a distribution P over tag counts y in one library informed by the tag count x in the other library, under the null hypothesis that the tag counts are generated from the same but unknown Poisson distribution. P is obtained by Bayesian averaging (infinite mixture) of all possible Poisson distributions with mixing proportions equal to the posteriors (given x) under the flat prior over m. \r\n\r\nWe ask: Given that the tag count samples from SAGE libraries are *extremely* limited, how useful actually is the Audic-Claverie methodology? We rigorously analyse the A-C statistic P that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation.\r\n\r\nWe show will that the A-C statistic P and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the K-L divergence from the true unknown Poisson distribution to the A-C statistic is minimised when the A-C statistic is conditioned on the mode of the Poisson distribution. Most importantly (and perhaps rather surprisingly), the expectation of this K-L divergence never exceeds 1\/2 bit! This constitutes a rigorous quantitative argument, extending the previous empirical Monte Carlo studies, that supports the wide spread use of Audic-Claverie method, even though by their very nature, the SAGE libraries represent very sparse samples.","keywords":["Audic-Claverie statistic","Bayesian averaging","information theory","one-shot learning","Poisson distribution"],"author":{"@type":"Person","name":"Tino, Peter","givenName":"Peter","familyName":"Tino"},"position":4,"pageStart":1,"pageEnd":9,"dateCreated":"2010-11-05","datePublished":"2010-11-05","isAccessibleForFree":true,"license":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode","copyrightHolder":{"@type":"Person","name":"Tino, Peter","givenName":"Peter","familyName":"Tino"},"copyrightYear":"2010","accessMode":"textual","accessModeSufficient":"textual","creativeWorkStatus":"Published","inLanguage":"en-US","sameAs":"https:\/\/doi.org\/10.4230\/DagSemProc.10302.4","publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","isPartOf":{"@type":"PublicationVolume","@id":"#volume819","volumeNumber":10302,"name":"Dagstuhl Seminar Proceedings, Volume 10302","dateCreated":"2010-11-05","datePublished":"2010-11-05","isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#article2815","isPartOf":{"@type":"Periodical","@id":"#series119","name":"Dagstuhl Seminar Proceedings","issn":"1862-4405","isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#volume819"}}}