Abstract
Suppose that we are given sample access to an unknown distribution p over n elements and an explicit distribution q over the same n elements. We would like to reject the null hypothesis "p=q" after seeing as few samples as possible, when p =/= q, while we never want to reject the null, when p=q. Wellknown results show that Theta(sqrt{n}/epsilon^2) samples are necessary and sufficient for distinguishing whether p equals q versus p is epsilonfar from q in total variation distance. However, this requires the distinguishing radius epsilon to be fixed prior to deciding how many samples to request. Our goal is instead to design sequential hypothesis testers, i.e. online algorithms that request i.i.d. samples from p and stop as soon as they can confidently reject the hypothesis p=q, without being given a lower bound on the distance between p and q, when p =/= q. In particular, we want to minimize the number of samples requested by our tests as a function of the distance between p and q, and if p=q we want the algorithm, with high probability, to never reject the null. Our work is motivated by and addresses the practical challenge of sequential A/B testing in Statistics.
We show that, when n=2, any sequential hypothesis test must see Omega(1/{d_{tv}(p,q)^2} log log 1/{d_{tv}(p,q)}) samples, with high (constant) probability, before it rejects p=q, where d_{tv}(p,q) is the  unknown to the tester  total variation distance between p and q. We match the dependence of this lower bound on d_{tv}(p,q) by proposing a sequential tester that rejects p=q from at most O({\sqrt{n}}/{d_{tv}(p,q)^2}log log 1/{d_{tv}(p,q)}) samples with high (constant) probability. The Omega(sqrt{n}) dependence on the support size n is also known to be necessary.
We similarly provide twosample sequential hypothesis testers, when sample access is given to both p and q, and discuss applications to sequential A/B testing.
BibTeX  Entry
@InProceedings{daskalakis_et_al:LIPIcs:2017:7823,
author = {Constantinos Daskalakis and Yasushi Kawase},
title = {{Optimal Stopping Rules for Sequential Hypothesis Testing}},
booktitle = {25th Annual European Symposium on Algorithms (ESA 2017)},
pages = {32:132:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770491},
ISSN = {18688969},
year = {2017},
volume = {87},
editor = {Kirk Pruhs and Christian Sohler},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2017/7823},
URN = {urn:nbn:de:0030drops78237},
doi = {10.4230/LIPIcs.ESA.2017.32},
annote = {Keywords: property testing, sequential hypothesis testing, A/B testing}
}
Keywords: 

property testing, sequential hypothesis testing, A/B testing 
Seminar: 

25th Annual European Symposium on Algorithms (ESA 2017) 
Issue Date: 

2017 
Date of publication: 

31.08.2017 