{"@context":"https:\/\/schema.org\/","@type":"ScholarlyArticle","@id":"#article10590","name":"Learning Discrete Distributions from Untrusted Batches","abstract":"We consider the problem of learning a discrete distribution in the presence of an epsilon fraction of malicious data sources. Specifically, we consider the setting where there is some underlying distribution, p, and each data source provides a batch of >= k samples, with the guarantee that at least a (1 - epsilon) fraction of the sources draw their samples from a distribution with total variation distance at most \\eta from p. We make no assumptions on the data provided by the remaining epsilon fraction of sources--this data can even be chosen as an adversarial function of the (1 - epsilon) fraction of \"good\" batches. We provide two algorithms: one with runtime exponential in the support size, n, but polynomial in k, 1\/epsilon and 1\/eta that takes O((n + k)\/epsilon^2) batches and recovers p to error O(eta + epsilon\/sqrt(k)). This recovery accuracy is information theoretically optimal, to constant factors, even given an infinite number of data sources. Our second algorithm applies to the eta = 0 setting and also achieves an O(epsilon\/sqrt(k)) recover guarantee, though it runs in poly((nk)^k) time. This second algorithm, which approximates a certain tensor via a rank-1 tensor minimizing l_1 distance, is surprising in light of the hardness of many low-rank tensor approximation problems, and may be of independent interest.","keywords":["robust statistics","information-theoretic optimality"],"author":[{"@type":"Person","name":"Qiao, Mingda","givenName":"Mingda","familyName":"Qiao"},{"@type":"Person","name":"Valiant, Gregory","givenName":"Gregory","familyName":"Valiant"}],"position":47,"pageStart":"47:1","pageEnd":"47:20","dateCreated":"2018-01-12","datePublished":"2018-01-12","isAccessibleForFree":true,"license":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/legalcode","copyrightHolder":[{"@type":"Person","name":"Qiao, Mingda","givenName":"Mingda","familyName":"Qiao"},{"@type":"Person","name":"Valiant, Gregory","givenName":"Gregory","familyName":"Valiant"}],"copyrightYear":"2018","accessMode":"textual","accessModeSufficient":"textual","creativeWorkStatus":"Published","inLanguage":"en-US","sameAs":"https:\/\/doi.org\/10.4230\/LIPIcs.ITCS.2018.47","publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","citation":["http:\/\/dx.doi.org\/10.1002\/9780470434697","https:\/\/arxiv.org\/abs\/1610.05492","https:\/\/research.google.com\/pubs\/pub44822.html"],"isPartOf":{"@type":"PublicationVolume","@id":"#volume6297","volumeNumber":94,"name":"9th Innovations in Theoretical Computer Science Conference (ITCS 2018)","dateCreated":"2018-01-12","datePublished":"2018-01-12","editor":{"@type":"Person","name":"Karlin, Anna R.","givenName":"Anna R.","familyName":"Karlin"},"isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#article10590","isPartOf":{"@type":"Periodical","@id":"#series116","name":"Leibniz International Proceedings in Informatics","issn":"1868-8969","isAccessibleForFree":true,"publisher":"Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik","hasPart":"#volume6297"}}}