A Randomized Online Quantile Summary in O(1/epsilon * log(1/epsilon)) Words
A quantile summary is a data structure that approximates to epsilon-relative error the order statistics of a much larger underlying dataset.
In this paper we develop a randomized online quantile summary for the cash register data input model and comparison data domain model that uses O((1/epsilon) log(1/epsilon)) words of memory. This improves upon the previous best upper bound of O((1/epsilon) (log(1/epsilon))^(3/2)) by Agarwal et al. (PODS 2012). Further, by a lower bound of Hung and Ting (FAW 2010) no deterministic summary for the comparison model can outperform our randomized summary in terms of space complexity. Lastly, our summary has the nice property that O((1/epsilon) log(1/epsilon)) words suffice to ensure that the success probability is 1 - exp(-poly(1/epsilon)).
order statistics
data stream
streaming algorithm
775-785
Regular Paper
David
Felber
David Felber
Rafail
Ostrovsky
Rafail Ostrovsky
10.4230/LIPIcs.APPROX-RANDOM.2015.775
Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. In Proceedings of the 31st Symposium on Principles of Database Systems, PODS'12, pages 23-34, New York, NY, USA, 2012. ACM.
Pankaj K Agarwal, Graham Cormode, Zengfeng Huang, Jeff M Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. ACM Transactions on Database Systems (TODS), 38(4):26, 2013.
Michael Greenwald and Sanjeev Khanna. Space-efficient online computation of quantile summaries. ACM SIGMOD Record, 30(2):58-66, 2001.
Regant YS Hung and Hingfung F Ting. An$$1 omega ($$1 frac 1$$1 varepsilon$$1 log$$1 frac 1$$1 varepsilon) space lower bound for finding ε-approximate quantiles in a data stream. In Frontiers in Algorithmics, pages 89-100. Springer, 2010.
Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G Lindsay. Approximate medians and other quantiles in one pass and with limited memory. ACM SIGMOD Record, 27(2):426-435, 1998.
Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. ACM SIGMOD Record, 28(2):251-262, 1999.
J. I. Munro and M. S. Paterson. Selection and sorting with limited storage. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science, SFCS'78, pages 253-258, Washington, DC, USA, 1978. IEEE Computer Society.
Problem 2: Quantiles - open problems in sublinear algorithms (suggested by graham cormode at kanpur 2006), 2006. URL: http://sublinear.info/2.
http://sublinear.info/2
Nisheeth Shrivastava, Chiranjeeb Buragohain, Divyakant Agrawal, and Subhash Suri. Medians and beyond: new aggregation techniques for sensor networks. In Proceedings of the 2nd international conference on Embedded networked sensor systems, pages 239-249. ACM, 2004.
Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2):264-280, 1971.
Lu Wang, Ge Luo, Ke Yi, and Graham Cormode. Quantiles over data streams: an experimental study. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 737-748. ACM, 2013.
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode