New Algorithms for Distributed Sliding Windows

Authors Sutanu Gayen, N. V. Vinodchandran



PDF
Thumbnail PDF

File

LIPIcs.SWAT.2018.22.pdf
  • Filesize: 0.5 MB
  • 15 pages

Document Identifiers

Author Details

Sutanu Gayen
  • University of Nebraska-Lincoln, Lincoln NE, USA
N. V. Vinodchandran
  • University of Nebraska-Lincoln, Lincoln NE, USA

Cite AsGet BibTex

Sutanu Gayen and N. V. Vinodchandran. New Algorithms for Distributed Sliding Windows. In 16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 101, pp. 22:1-22:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.SWAT.2018.22

Abstract

Computing functions over a distributed stream of data is a significant problem with practical applications. The distributed streaming model is a natural computational model to deal with such scenarios. The goal in this model is to maintain an approximate value of a function of interest over a data stream distributed across several computational nodes. These computational nodes have a two-way communication channel with a coordinator node that maintains an approximation of the function over the entire data stream seen so far. The resources of interest, which need to be minimized, are communication (primary), space, and update time. A practical variant of this model is that of distributed sliding window (dsw), where the computation is limited to the last W items, where W is the window size. Important problems such as sampling and counting have been investigated in this model. However, certain problems including computing frequency moments and metric clustering, that are well studied in other streaming models, have not been considered in the distributed sliding window model. We give the first algorithms for computing the frequency moments and metric clustering problems in the distributed sliding window model. Our algorithms for these problems are a result of a general transfer theorem we establish that transforms any algorithm in the distributed infinite window model to an algorithm in the distributed sliding window model, for a large class of functions. In particular, we show an efficient adaptation of the smooth histogram technique of Braverman and Ostrovsky, to the distributed streaming model. Our construction allows trade-offs between communication and space. If we optimize for communication, we get algorithms that are as communication efficient as their infinite window counter parts (upto polylogarithmic factors).

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming models
  • Theory of computation → Sketching and sampling
  • Theory of computation → Distributed algorithms
Keywords
  • distributed streaming
  • distributed functional monitoring
  • distributed sliding window
  • frequency moments
  • k-median clustering
  • k-center clustering

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Brian Babcock, Mayur Datar, Rajeev Motwani, and Liadan O'Callaghan. Maintaining variance and k-medians over data stream windows. In Proceedings of the Twenty-Second Symposium on Principles of Database Systems PODS, pages 234-243, 2003. Google Scholar
  2. Brian Babcock and Chris Olston. Distributed top-k monitoring. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 28-39, 2003. Google Scholar
  3. Vladimir Braverman, Harry Lang, Keith Levin, and Morteza Monemizadeh. Clustering on sliding windows in polylogarithmic space. In 35th IARCS Annual Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS, pages 350-364, 2015. Google Scholar
  4. Vladimir Braverman, Harry Lang, Keith Levin, and Morteza Monemizadeh. Clustering problems on sliding windows. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1374-1390, 2016. Google Scholar
  5. Vladimir Braverman and Rafail Ostrovsky. Effective computations on sliding windows. SIAM J. Comput., 39(6):2113-2131, 2010. Google Scholar
  6. Ho-Leung Chan, Tak Wah Lam, Lap-Kei Lee, and Hing-Fung Ting. Continuous monitoring of distributed data streams over a time-based sliding window. In 27th International Symposium on Theoretical Aspects of Computer Science, STACS, pages 179-190, 2010. Google Scholar
  7. Jiecao Chen and Qin Zhang. Improved algorithms for distributed entropy monitoring. Algorithmica, 78(3):1041-1066, Jul 2017. Google Scholar
  8. Vincent Cohen-Addad, Chris Schwiegelshohn, and Christian Sohler. Diameter and k-center in sliding windows. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP, pages 19:1-19:12, 2016. Google Scholar
  9. Graham Cormode. Algorithms for continuous distributing monitoring: A survey. In First International Workshop on Algorithms and Models for Distributed Event Processing, pages 1-10, 2011. Google Scholar
  10. Graham Cormode, Minos N. Garofalakis, S. Muthukrishnan, and Rajeev Rastogi. Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 25-36, 2005. Google Scholar
  11. Graham Cormode, S. Muthukrishnan, and Ke Yi. Algorithms for distributed functional monitoring. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1076-1085, 2008. Google Scholar
  12. Graham Cormode, S. Muthukrishnan, Ke Yi, and Qin Zhang. Optimal sampling from distributed streams. In Proceedings of the Twenty-Ninth ACM Symposium on Principles of Database Systems, PODS, pages 77-86, 2010. Google Scholar
  13. Graham Cormode, S. Muthukrishnan, and Wei Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In Proceedings of the 23rd International Conference on Data Engineering, ICDE, pages 1036-1045, 2007. Google Scholar
  14. Graham Cormode and Ke Yi. Tracking distributed aggregates over time-based sliding windows. In Scientific and Statistical Database Management - 24th International Conference, SSDBM, pages 416-430, 2012. Google Scholar
  15. Abhinandan Das, Sumit Ganguly, Minos N. Garofalakis, and Rajeev Rastogi. Distributed set expression cardinality estimation. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, pages 312-323, 2004. Google Scholar
  16. Sutanu Gayen and N. V. Vinodchandran. Algorithms for k-median clustering over distributed streams. In Computing and Combinatorics - 22nd International Conference, COCOON, pages 535-546, 2016. Google Scholar
  17. Zengfeng Huang, Ke Yi, and Qin Zhang. Randomized algorithms for tracking distributed count, frequencies, and ranks. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 295-306, 2012. Google Scholar
  18. Ram Keralapura, Graham Cormode, and Jeyashankher Ramamirtham. Communication-efficient distributed monitoring of thresholded counts. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 289-300, 2006. Google Scholar
  19. Chris Olston, Jing Jiang, and Jennifer Widom. Adaptive filters for continuous queries over distributed data streams. In Proceedings of the International Conference on Management of Data (SIGMOD), pages 563-574, 2003. Google Scholar
  20. Nicolo Rivetti, Yann Busnel, and Achour Mostéfaoui. Efficiently summarizing data streams over sliding windows. In 14th IEEE International Symposium on Network Computing and Applications, NCA, pages 151-158, 2015. Google Scholar
  21. Srikanta Tirthapura and David P. Woodruff. Optimal random sampling from distributed streams revisited. In Distributed Computing - 25th International Symposium, DISC, pages 283-297, 2011. Google Scholar
  22. David P. Woodruff and Qin Zhang. Tight bounds for distributed functional monitoring. In Proceedings of the 44th Symposium on Theory of Computing Conference, STOC, pages 941-960, 2012. Google Scholar