Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs

Authors Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh R. Saxena, Zhao Song, Huacheng Yu



PDF
Thumbnail PDF

File

LIPIcs.ICALP.2021.52.pdf
  • Filesize: 0.81 MB
  • 19 pages

Document Identifiers

Author Details

Lijie Chen
  • MIT, Cambridge, MA, USA
Gillat Kol
  • Princeton University, NJ, USA
Dmitry Paramonov
  • Princeton University, NJ, USA
Raghuvansh R. Saxena
  • Princeton University, NJ, USA
Zhao Song
  • Institute for Advanced Study, Princeton, NJ, US
Huacheng Yu
  • Princeton University, NJ, USA

Acknowledgements

We would like to thank Rajesh Jayaram for discussions on 𝓁₁ heavy hitters.

Cite AsGet BibTex

Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh R. Saxena, Zhao Song, and Huacheng Yu. Near-Optimal Two-Pass Streaming Algorithm for Sampling Random Walks over Directed Graphs. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 198, pp. 52:1-52:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ICALP.2021.52

Abstract

For a directed graph G with n vertices and a start vertex u_start, we wish to (approximately) sample an L-step random walk over G starting from u_start with minimum space using an algorithm that only makes few passes over the edges of the graph. This problem found many applications, for instance, in approximating the PageRank of a webpage. If only a single pass is allowed, the space complexity of this problem was shown to be Θ̃(n ⋅ L). Prior to our work, a better space complexity was only known with Õ(√L) passes. We essentially settle the space complexity of this random walk simulation problem for two-pass streaming algorithms, showing that it is Θ̃(n ⋅ √L), by giving almost matching upper and lower bounds. Our lower bound argument extends to every constant number of passes p, and shows that any p-pass algorithm for this problem uses Ω̃(n ⋅ L^{1/p}) space. In addition, we show a similar Θ̃(n ⋅ √L) bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an L-step random walk from every vertex in the graph.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • streaming algorithms
  • random walk sampling

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Reid Andersen, Fan Chung, and Kevin Lang. Using pagerank to locally partition a graph. Internet Mathematics, 4(1):35-64, 2007. Google Scholar
  2. Reid Andersen and Yuval Peres. Finding sparse cuts locally using evolving sets. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 235-244, 2009. Google Scholar
  3. Sepehr Assadi, Yu Chen, and Sanjeev Khanna. Sublinear algorithms for (Δ + 1) vertex coloring. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 767-786. SIAM, 2019. Google Scholar
  4. Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. In 48th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 698-711. Association for Computing Machinery, 2016. Google Scholar
  5. Kazuoki Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, Second Series, 19(3):357-367, 1967. Google Scholar
  6. Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7):107-117, 1998. Google Scholar
  7. Amit Chakrabarti, Prantar Ghosh, Andrew McGregor, and Sofya Vorotnikova. Vertex ordering problems in directed graph streams. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1786-1802. SIAM, 2020. Google Scholar
  8. Amit Chakrabarti and Anthony Wirth. Incidence geometries and the pass complexity of semi-streaming set cover. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms (SODA), pages 1365-1373. SIAM, 2016. Google Scholar
  9. Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 693-703. Springer, 2002. Google Scholar
  10. Moses Charikar, Liadan O'Callaghan, and Rina Panigrahy. Better streaming algorithms for clustering problems. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 30-39, 2003. Google Scholar
  11. Lijie Chen, Gillat Kol, Dmitry Paramonov, Raghuvansh Saxena, Zhao Song, and Huacheng Yu. Near-optimal two-pass streaming algorithm for sampling random walks over directed graphs. CoRR, abs/2102.11251, 2021. URL: http://arxiv.org/abs/2102.11251.
  12. Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, pages 493-507, 1952. Google Scholar
  13. Graham Cormode, Jacques Dark, and Christian Konrad. Independent sets in vertex-arrival streams. In 46th International Colloquium on Automata, Languages, and Programming (ICALP). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. Google Scholar
  14. Yuval Emek and Adi Rosén. Semi-streaming set cover. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 453-464. Springer, 2014. Google Scholar
  15. Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 531-543. Springer, 2004. Google Scholar
  16. Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. Graph distances in the data-stream model. SIAM Journal on Computing, 38(5):1709-1727, 2009. Google Scholar
  17. Buddhima Gamlath, Sagar Kale, Slobodan Mitrovic, and Ola Svensson. Weighted matchings via unweighted augmentations. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing (PODC), pages 491-500, 2019. Google Scholar
  18. Mohsen Ghaffari, Themis Gouleakis, Christian Konrad, Slobodan Mitrović, and Ronitt Rubinfeld. Improved massively parallel computation algorithms for mis, matching, and vertex cover. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing (PODC), pages 129-138, 2018. Google Scholar
  19. Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication and streaming complexity of maximum bipartite matching. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms (SODA), pages 468-485. SIAM, 2012. Google Scholar
  20. Venkatesan Guruswami and Krzysztof Onak. Superlinear lower bounds for multipass graph processing. Algorithmica, 76(3):654-683, 2016. Google Scholar
  21. Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds for the streaming set cover problem. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), pages 371-383, 2016. Google Scholar
  22. Monika Rauch Henzinger, Prabhakar Raghavan, and Sridhar Rajagopalan. Computing on data streams. External memory algorithms, 50:107-118, 1998. Google Scholar
  23. Wassily Hoeffding. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding, pages 409-426. Springer, 1994. Google Scholar
  24. Rajesh Jayaram and David P. Woodruff. Perfect lp sampling in a data stream. In Mikkel Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 544-555. IEEE Computer Society, 2018. Google Scholar
  25. Mark Jerrum and Alistair Sinclair. Approximating the permanent. SIAM journal on computing, 18(6):1149-1178, 1989. Google Scholar
  26. Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical computer science, 43:169-188, 1986. Google Scholar
  27. Ce Jin. Simulating random walks on graphs in the streaming model. In Avrim Blum, editor, 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, volume 124 of LIPIcs, pages 46:1-46:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. URL: https://doi.org/10.4230/LIPIcs.ITCS.2019.46.
  28. Michael Kapralov. Better bounds for matchings in the streaming model. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms (SODA), pages 1679-1697. SIAM, 2013. Google Scholar
  29. Andrew McGregor. Finding graph matchings in data streams. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 170-181. Springer, 2005. Google Scholar
  30. Sagnik Mukhopadhyay and Danupon Nanongkai. Weighted min-cut: sequential, cut-query, and streaming algorithms. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 496-509, 2020. Google Scholar
  31. Noam Nisan and Avi Wigderson. Rounds in communication complexity revisited. In Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, May 5-8, 1991, New Orleans, Louisiana, USA, pages 419-429. ACM, 1991. Google Scholar
  32. Omer Reingold. Undirected connectivity in log-space. Journal of the ACM (JACM), 55(4):1-24, 2008. Google Scholar
  33. Aviad Rubinstein, Tselil Schramm, and Seth Matthew Weinberg. Computing exact minimum cuts without knowing the graph. In 9th Innovations in Theoretical Computer Science (ITCS), page 39. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2018. Google Scholar
  34. Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. Estimating pagerank on graph streams. J. ACM, 58(3):13:1-13:19, 2011. URL: https://doi.org/10.1145/1970392.1970397.
  35. Aaron Schild. An almost-linear time algorithm for uniform random spanning tree generation. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 214-227, 2018. Google Scholar
  36. Daniel A Spielman and Shang-Hua Teng. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing, 42(1):1-26, 2013. Google Scholar
  37. Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37-57, 1985. URL: https://doi.org/10.1145/3147.3165.
  38. Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In 18th Annual Symposium on Foundations of Computer Science (FOCS), pages 222-227. IEEE Computer Society, 1977. Google Scholar
  39. Mariano Zelke. Intractability of min-and max-cut in streaming graphs. Information Processing Letters, 111(3):145-150, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail