An Optimal Algorithm for Sliding Window Order Statistics

Author Pavel Raykov



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2023.5.pdf
  • Filesize: 0.64 MB
  • 13 pages

Document Identifiers

Author Details

Pavel Raykov
  • Google Zürich, Switzerland

Acknowledgements

We thank anonymous reviewers for their useful comments which led to an improved manuscript. We would also like to thank Mikhail Churakov, Christian Matt, and Dimitris Paparas for proofreading the paper.

Cite AsGet BibTex

Pavel Raykov. An Optimal Algorithm for Sliding Window Order Statistics. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 5:1-5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ICDT.2023.5

Abstract

Assume there is a data stream of elements and a window of size m. Sliding window algorithms compute various statistic functions over the last m elements of the data stream seen so far. The time complexity of a sliding window algorithm is measured as the time required to output an updated statistic function value every time a new element is read. For example, it is well known that computing the sliding window maximum/minimum has time complexity O(1) while computing the sliding window median has time complexity O(log m). In this paper we close the gap between these two cases by (1) presenting an algorithm for computing the sliding window k-th smallest element in O(log k) time and (2) prove that this time complexity is optimal.

Subject Classification

ACM Subject Classification
  • Theory of computation → Sorting and searching
Keywords
  • sliding window
  • order statistics
  • median
  • selection algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Andrei Alexandrescu. Fast deterministic selection. In Costas S. Iliopoulos, Solon P. Pissis, Simon J. Puglisi, and Rajeev Raman, editors, 16th International Symposium on Experimental Algorithms, SEA 2017, June 21-23, 2017, London, UK, volume 75 of LIPIcs, pages 24:1-24:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017. URL: https://doi.org/10.4230/LIPIcs.SEA.2017.24.
  2. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137-147, 1999. URL: https://doi.org/10.1006/jcss.1997.1545.
  3. Arvind Arasu and Gurmeet Singh Manku. Approximate counts and quantiles over sliding windows. In Catriel Beeri and Alin Deutsch, editors, Proceedings of the Twenty-third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 14-16, 2004, Paris, France, pages 286-296. ACM, 2004. URL: https://doi.org/10.1145/1055558.1055598.
  4. Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In Lucian Popa, Serge Abiteboul, and Phokion G. Kolaitis, editors, Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3-5, Madison, Wisconsin, USA, pages 1-16. ACM, 2002. URL: https://doi.org/10.1145/543613.543615.
  5. Brian Babcock, Mayur Datar, Rajeev Motwani, and Liadan O'Callaghan. Maintaining variance and k-medians over data stream windows. In Frank Neven, Catriel Beeri, and Tova Milo, editors, Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, pages 234-243. ACM, 2003. URL: https://doi.org/10.1145/773153.773176.
  6. Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert Endre Tarjan. Time bounds for selection. J. Comput. Syst. Sci., 7(4):448-461, 1973. URL: https://doi.org/10.1016/S0022-0000(73)80033-9.
  7. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 3rd Edition. MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms.
  8. Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM J. Comput., 31(6):1794-1813, 2002. URL: https://doi.org/10.1137/S0097539701398363.
  9. Scott C. Douglas. Running max/min calculation using a pruned ordered list. IEEE Trans. Signal Process., 44(11):2872-2877, 1996. URL: https://doi.org/10.1109/78.542446.
  10. James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, and Robert Endre Tarjan. Making data structures persistent. In Juris Hartmanis, editor, Proceedings of the 18th Annual ACM Symposium on Theory of Computing, May 28-30, 1986, Berkeley, California, USA, pages 109-121. ACM, 1986. URL: https://doi.org/10.1145/12130.12142.
  11. David Eppstein. Nontrivial algorithm for computing a sliding window median [online]. 2014. URL: https://cstheory.stackexchange.com/questions/21730/nontrivial-algorithm-for-computing-a-sliding-window-median.
  12. Michael Greenwald and Sanjeev Khanna. Space-efficient online computation of quantile summaries. In Sharad Mehrotra and Timos K. Sellis, editors, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, Santa Barbara, CA, USA, May 21-24, 2001, pages 58-66. ACM, 2001. URL: https://doi.org/10.1145/375663.375670.
  13. Dana Jansens. Persistent binary search trees [online]. 2009. URL: https://cglab.ca/~dana/pbst/.
  14. Martti Juhola, Jyrki Katajainen, and Timo Raita. Comparison of algorithms for standard median filtering. IEEE Trans. Signal Process., 39(1):204-208, 1991. URL: https://doi.org/10.1109/78.80784.
  15. Donald E. Knuth. The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition. Addison-Wesley, 1973. URL: https://www.worldcat.org/oclc/310903895.
  16. Georgiy Korneev. Designing an algorithm that searches for the kth largest element between two AVL trees [online]. 2016. URL: https://stackoverflow.com/questions/40473890/designing-an-algorithm-that-searches-for-the-kth-largest-element-between-two-avl.
  17. Danny Krizanc, Pat Morin, and Michiel H. M. Smid. Range mode and range median queries on lists and trees. Nord. J. Comput., 12(1):1-17, 2005. Google Scholar
  18. Xuemin Lin, Hongjun Lu, Jian Xu, and Jeffrey Xu Yu. Continuously maintaining quantile summaries of the most recent N elements over a data stream. In Z. Meral Özsoyoglu and Stanley B. Zdonik, editors, Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, 30 March - 2 April 2004, Boston, MA, USA, pages 362-373. IEEE Computer Society, 2004. URL: https://doi.org/10.1109/ICDE.2004.1320011.
  19. Bongki Moon, Inés Fernando Vega López, and Vijaykumar Immanuel. Scalable algorithms for large temporal aggregation. In David B. Lomet and Gerhard Weikum, editors, Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, February 28 - March 3, 2000, pages 145-154. IEEE Computer Society, 2000. URL: https://doi.org/10.1109/ICDE.2000.839401.
  20. Jürg Nievergelt and Edward M. Reingold. Binary search trees of bounded balance. In Patrick C. Fischer, H. Paul Zeiger, Jeffrey D. Ullman, and Arnold L. Rosenberg, editors, Proceedings of the 4th Annual ACM Symposium on Theory of Computing, May 1-3, 1972, Denver, Colorado, USA, pages 137-142. ACM, 1972. URL: https://doi.org/10.1145/800152.804906.
  21. Chris Okasaki. Red-black trees in a functional setting. J. Funct. Program., 9(4):471-477, 1999. URL: https://doi.org/10.1017/s0956796899003494.
  22. Mike Paterson. Progress in selection. In Rolf G. Karlsson and Andrzej Lingas, editors, Algorithm Theory - SWAT '96, 5th Scandinavian Workshop on Algorithm Theory, Reykjavík, Iceland, July 3-5, 1996, Proceedings, volume 1097 of Lecture Notes in Computer Science, pages 368-379. Springer, 1996. URL: https://doi.org/10.1007/3-540-61422-2_146.
  23. Abhiroop Sarkar. Persistent red black trees in Haskell [online]. 2017. URL: https://abhiroop.github.io/Haskell-Red-Black-Tree/.
  24. Jukka Suomela. Median filtering is equivalent to sorting. CoRR, abs/1406.1717, 2014. URL: http://arxiv.org/abs/1406.1717.
  25. Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. Low-latency sliding-window aggregation in worst-case constant time. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, DEBS 2017, Barcelona, Spain, June 19-23, 2017, pages 66-77. ACM, 2017. URL: https://doi.org/10.1145/3093742.3093925.
  26. Jun Yang and Jennifer Widom. Incremental computation and maintenance of temporal aggregates. VLDB J., 12(3):262-283, 2003. URL: https://doi.org/10.1007/s00778-003-0107-z.
  27. Andrew Chi-Chih Yao. Space-time tradeoff for answering range queries (extended abstract). In Harry R. Lewis, Barbara B. Simons, Walter A. Burkhard, and Lawrence H. Landweber, editors, Proceedings of the 14th Annual ACM Symposium on Theory of Computing, May 5-7, 1982, San Francisco, California, USA, pages 128-136. ACM, 1982. URL: https://doi.org/10.1145/800070.802185.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail