A Randomized Online Quantile Summary in O(1/epsilon * log(1/epsilon)) Words

Authors David Felber, Rafail Ostrovsky



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2015.775.pdf
  • Filesize: 0.5 MB
  • 11 pages

Document Identifiers

Author Details

David Felber
Rafail Ostrovsky

Cite As Get BibTex

David Felber and Rafail Ostrovsky. A Randomized Online Quantile Summary in O(1/epsilon * log(1/epsilon)) Words. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 40, pp. 775-785, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015) https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2015.775

Abstract

A quantile summary is a data structure that approximates to epsilon-relative error the order statistics of a much larger underlying dataset.

In this paper we develop a randomized online quantile summary for the cash register data input model and comparison data domain model that uses O((1/epsilon) log(1/epsilon)) words of memory. This improves upon the previous best upper bound of O((1/epsilon) (log(1/epsilon))^(3/2)) by Agarwal et al. (PODS 2012). Further, by a lower bound of Hung and Ting (FAW 2010) no deterministic summary for the comparison model can outperform our randomized summary in terms of space complexity. Lastly, our summary has the nice property that O((1/epsilon) log(1/epsilon)) words suffice to ensure that the success probability is 1 - exp(-poly(1/epsilon)).

Subject Classification

Keywords
  • order statistics
  • data stream
  • streaming algorithm

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. In Proceedings of the 31st Symposium on Principles of Database Systems, PODS'12, pages 23-34, New York, NY, USA, 2012. ACM. Google Scholar
  2. Pankaj K Agarwal, Graham Cormode, Zengfeng Huang, Jeff M Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. ACM Transactions on Database Systems (TODS), 38(4):26, 2013. Google Scholar
  3. Michael Greenwald and Sanjeev Khanna. Space-efficient online computation of quantile summaries. ACM SIGMOD Record, 30(2):58-66, 2001. Google Scholar
  4. Regant YS Hung and Hingfung F Ting. An$$1 omega ($$1 frac 1$$1 varepsilon$$1 log$$1 frac 1$$1 varepsilon) space lower bound for finding ε-approximate quantiles in a data stream. In Frontiers in Algorithmics, pages 89-100. Springer, 2010. Google Scholar
  5. Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G Lindsay. Approximate medians and other quantiles in one pass and with limited memory. ACM SIGMOD Record, 27(2):426-435, 1998. Google Scholar
  6. Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. ACM SIGMOD Record, 28(2):251-262, 1999. Google Scholar
  7. J. I. Munro and M. S. Paterson. Selection and sorting with limited storage. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science, SFCS'78, pages 253-258, Washington, DC, USA, 1978. IEEE Computer Society. Google Scholar
  8. Problem 2: Quantiles - open problems in sublinear algorithms (suggested by graham cormode at kanpur 2006), 2006. URL: http://sublinear.info/2.
  9. Nisheeth Shrivastava, Chiranjeeb Buragohain, Divyakant Agrawal, and Subhash Suri. Medians and beyond: new aggregation techniques for sensor networks. In Proceedings of the 2nd international conference on Embedded networked sensor systems, pages 239-249. ACM, 2004. Google Scholar
  10. Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2):264-280, 1971. Google Scholar
  11. Lu Wang, Ge Luo, Ke Yi, and Graham Cormode. Quantiles over data streams: an experimental study. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 737-748. ACM, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail