Clustering on Sliding Windows in Polylogarithmic Space

Authors Vladimir Braverman, Harry Lang, Keith Levin, Morteza Monemizadeh



PDF
Thumbnail PDF

File

LIPIcs.FSTTCS.2015.350.pdf
  • Filesize: 0.55 MB
  • 15 pages

Document Identifiers

Author Details

Vladimir Braverman
Harry Lang
Keith Levin
Morteza Monemizadeh

Cite As Get BibTex

Vladimir Braverman, Harry Lang, Keith Levin, and Morteza Monemizadeh. Clustering on Sliding Windows in Polylogarithmic Space. In 35th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 45, pp. 350-364, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015) https://doi.org/10.4230/LIPIcs.FSTTCS.2015.350

Abstract

In PODS 2003, Babcock, Datar, Motwani and O'Callaghan gave the first streaming solution for the k-median problem on sliding windows using
O(frack k tau^4 W^2tau log^2 W) space, with a O(2^O(1/tau)) approximation factor, where W is the window size and tau in (0,1/2) is a user-specified parameter. They left as an open question whether it is possible to improve this to polylogarithmic space. Despite much progress on clustering and sliding windows, this question has remained open for more than a decade.

In this paper, we partially answer the main open question posed by Babcock, Datar, Motwani and O'Callaghan. We present an algorithm yielding an exponential improvement in space compared to the previous result given in Babcock, et al. In particular, we give the first polylogarithmic space (alpha,beta)-approximation for metric k-median clustering in the sliding window model, where alpha and beta are constants, under the assumption, also made by Babcock et al., that the optimal k-median cost on any given window is bounded by a polynomial in the window size. We justify this assumption by showing that when the cost is exponential in the window size, no sublinear space approximation is possible. Our main technical contribution is a simple but elegant extension of smooth functions as introduced by Braverman and Ostrovsky, which allows us to apply well-known techniques for solving problems in the sliding window model
to functions that are not smooth, such as the k-median cost.

Subject Classification

Keywords
  • Streaming
  • Clustering
  • Sliding windows

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. P. K. Agarwal, S. Har-Peled, and K.R. Varadarajan. Approximating extent measures of points. Journal of the ACM, 51(4):606-635, 2004. Google Scholar
  2. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137-147, 1999. Google Scholar
  3. P. Awasthi and N.F. Balcan. Center based clustering: A foundational perspective. In Handbook of Cluster Analysis. CRC Press, 2014. Google Scholar
  4. B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan. Maintaining variance and k-medians over data stream windows. In Proceedings of the 9th ACM SIGMOD Symposium on Principles of Database Systems (PODS), pages 234-243, 2003. Google Scholar
  5. P. Beame, R. Clifford, and W. Machmouchi. Element distinctness, frequency moments, and sliding windows. In Proceedings of the 54th IEEE Symposium on Foundations of Computer Science (FOCS), pages 290-299, 2013. Google Scholar
  6. J.L. Bentley and J.B. Saxe. Decomposable searching problems I: Static-to-dynamic transformation. Journal of Algorithms, 1(4):301-358, 1980. Google Scholar
  7. V. Braverman, R. Gelles, and R. Ostrovsky. How to catch L₂-heavy-hitters on sliding windows. In Proceedings of the 19th Computing and Combinatorics Conference, pages 638-650, 2013. Google Scholar
  8. V. Braverman and R. Ostrovsky. Smooth histograms for sliding windows. In Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS), pages 283-293, 2007. Google Scholar
  9. V. Braverman and R. Ostrovsky. Effective computations on sliding windows. SIAM Journal on Computing, 39(6):2113-2131, 2010. Google Scholar
  10. V. Braverman, R. Ostrovsky, and A. Roytman. Zero-one laws for sliding windows and universal sketches. In Proceedings of APPROX-RANDOM, pages 573-590, 2015. Google Scholar
  11. V. Braverman, R. Ostrovsky, and C. Zaniolo. Optimal sampling from sliding windows. Journal of Computer and System Sciences, 78(1):260-272, 2012. Google Scholar
  12. J. Byrka, T. Pensyl, B. Rybicki, A. Srinivasan, and K. Trinh. An improved approximation for k-median, and positive correlation in budgeted optimization. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2015. Google Scholar
  13. M. Charikar, L. O'Callaghan, and R. Panigrahy. Better streaming algorithms for clustering problems. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC), pages 30-39, 2003. Google Scholar
  14. K. Chen. On coresets for k-median and k-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing, 39(3):923-947, 2009. Google Scholar
  15. G. Cormode. The continuous distributed monitoring model. SIGMOD Record, 42(1):5-14, 2013. Google Scholar
  16. G. Cormode and S. Muthukrishnan. What’s new: finding significant differences in network data streams. IEEE/ACM Transactions on Networking, 13(6):1219-1232, 2005. Google Scholar
  17. Graham Cormode and Minos N. Garofalakis. Streaming in a connected world: querying and tracking distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1178-1181, 2007. Google Scholar
  18. M. S. Crouch, A. McGregor, and D. Stubbs. Dynamic graphs in the sliding-window model. In Proceedings of the 21st Annual European Symposium on Algorithms (ESA), pages 337-348, 2013. Google Scholar
  19. A. Czumaj, C. Lammersen, M. Monemizadeh, and C. Sohler. (1+ε)-approximation for facility location in data streams. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1710-1728, 2013. Google Scholar
  20. M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 31(6):1794-1813, 2002. Google Scholar
  21. M. P. do Carmo. Riemannian Geometry. Birkhäuser, Boston, MA, 1992. Google Scholar
  22. D. Feldman, A. Fiat, and M. Sharir. Coresets for weighted facilities and their applications. In Proceedings of the 47th IEEE Symposium on Foundations of Computer Science (FOCS), pages 315-324, 2006. Google Scholar
  23. D. Feldman, M. Monemizadeh, and C. Sohler. A PTAS for k-means clustering based on weak coresets. In Proceedings of the 23rd Annual Symposium on Computational Geometry (SoCG), pages 11-18, 2007. Google Scholar
  24. G. Frahling and C. Sohler. Coresets in dynamic geometric data streams. In Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC), pages 209-217, 2005. Google Scholar
  25. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proceedings of the 41st IEEE Symposium on Foundations of Computer Science (FOCS), pages 359-366, 2000. Google Scholar
  26. S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. In Proceedings of the 21st Annual Symposium on Computational Geometry (SoCG), pages 126-134, 2005. Google Scholar
  27. S. Har-Peled and S. Mazumdar. Coresets for k-means and k-median clustering and their applications. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pages 291-300, 2004. Google Scholar
  28. P. Indyk. Algorithms for dynamic geometric problems over data streams. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pages 373-380, 2004. Google Scholar
  29. P. Indyk and E. Price. k-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance. In Proceedings of the 43rd Annual ACM Symposium on Theory of Computing (STOC), pages 627-636, 2011. Google Scholar
  30. K. Jain, M. Mahdian, E. Markakis, A. Saberi, and V. V. Vazirani. Greedy facility location algorithms analyzed using dual fitting with factor-revealing lp. Journal of the ACM, 50(6):795-824, 2003. Google Scholar
  31. E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997. Google Scholar
  32. S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129-137, 1982. Google Scholar
  33. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005. Google Scholar
  34. M. Osborne, S. Moran, R. McCreadie, A. Von Lunen, M. Sykora, E. Cano, N. Ireson, C. MacDonald, I. Ounis, Y. He, T. Jackson, F. Ciravegna, and A. O'Brien. Real-time detection, tracking and monitoring of automatically discovered events in social media. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, USA, 2014. Google Scholar
  35. J. J. Sylvester. A question in the geometry of situation. Quarterly Journal of Pure and Applied Mathematics, 1:79, 1857. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail