Approximating Single-Source Personalized PageRank with Absolute Error Guarantees

Authors Zhewei Wei , Ji-Rong Wen , Mingji Yang



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2024.9.pdf
  • Filesize: 0.83 MB
  • 19 pages

Document Identifiers

Author Details

Zhewei Wei
  • Renmin University of China, Beijing, China
Ji-Rong Wen
  • Renmin University of China, Beijing, China
Mingji Yang
  • Renmin University of China, Beijing, China

Acknowledgements

We also wish to acknowledge the support provided by Engineering Research Center of Next-Generation Intelligent Search and Recommendation, Ministry of Education. Finally, we thank the anonymous reviewers for their valuable comments.

Cite AsGet BibTex

Zhewei Wei, Ji-Rong Wen, and Mingji Yang. Approximating Single-Source Personalized PageRank with Absolute Error Guarantees. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 9:1-9:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICDT.2024.9

Abstract

Personalized PageRank (PPR) is an extensively studied and applied node proximity measure in graphs. For a pair of nodes s and t on a graph G = (V,E), the PPR value π(s,t) is defined as the probability that an α-discounted random walk from s terminates at t, where the walk terminates with probability α at each step. We study the classic Single-Source PPR query, which asks for PPR approximations from a given source node s to all nodes in the graph. Specifically, we aim to provide approximations with absolute error guarantees, ensuring that the resultant PPR estimates π̂(s,t) satisfy max_{t ∈ V} |π̂(s,t)-π(s,t)| ≤ ε for a given error bound ε. We propose an algorithm that achieves this with high probability, with an expected running time of - Õ(√m/ε) for directed graphs, where m = |E|; - Õ(√{d_max}/ε) for undirected graphs, where d_max is the maximum node degree in the graph; - Õ(n^{γ-1/2}/ε) for power-law graphs, where n = |V| and γ ∈ (1/2,1) is the extent of the power law. These sublinear bounds improve upon existing results. We also study the case when degree-normalized absolute error guarantees are desired, requiring max_{t ∈ V} |π̂(s,t)/d(t)-π(s,t)/d(t)| ≤ ε_d for a given error bound ε_d, where the graph is undirected and d(t) is the degree of node t. We give an algorithm that provides this error guarantee with high probability, achieving an expected complexity of Õ(√{∑_{t ∈ V} π(s,t)/d(t)}/ε_d). This improves over the previously known O(1/ε_d) complexity.

Subject Classification

ACM Subject Classification
  • Theory of computation → Graph algorithms analysis
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Graph Algorithms
  • Sublinear Algorithms
  • Personalized PageRank

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. Local computation of pagerank contributions. In Proc. 5th Int. Workshop Algorithms Models Web Graph, volume 4863, pages 150-165, 2007. URL: https://doi.org/10.1007/978-3-540-77004-6_12.
  2. Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. Local computation of pagerank contributions. Internet Math., 5(1):23-45, 2008. URL: https://doi.org/10.1080/15427951.2008.10129302.
  3. Reid Andersen and Fan R. K. Chung. Detecting sharp drops in pagerank and a simplified local partitioning algorithm. In Proc. 4th Int. Conf. Theory Appl. Models Comput., volume 4484, pages 1-12, 2007. URL: https://doi.org/10.1007/978-3-540-72504-6_1.
  4. Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. Local graph partitioning using pagerank vectors. In Proc. 47th Annu. IEEE Symp. Found. Comput. Sci., pages 475-486, 2006. URL: https://doi.org/10.1109/FOCS.2006.44.
  5. Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. Using pagerank to locally partition a graph. Internet Math., 4(1):35-64, 2007. URL: https://doi.org/10.1080/15427951.2007.10129139.
  6. Konstantin Avrachenkov, Paulo Gonçalves, and Marina Sokol. On the choice of kernel and labelled data in semi-supervised learning methods. In Proc. 10th Int. Workshop Algorithms Models Web Graph, volume 8305, pages 56-67, 2013. URL: https://doi.org/10.1007/978-3-319-03536-9_5.
  7. Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, Elena Smirnova, and Marina Sokol. Quick detection of top-k personalized pagerank lists. In Proc. 8th Int. Workshop Algorithms Models Web Graph, volume 6732, pages 50-61, 2011. URL: https://doi.org/10.1007/978-3-642-21286-4_5.
  8. Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. Fast personalized pagerank on mapreduce. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 973-984, 2011. URL: https://doi.org/10.1145/1989323.1989425.
  9. Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. Proc. VLDB Endowment, 4(3):173-184, 2010. URL: https://doi.org/10.14778/1929861.1929864.
  10. Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286(5439):509-512, 1999. URL: https://doi.org/10.1126/science.286.5439.509.
  11. Pavel Berkhin. Bookmark-coloring algorithm for personalized pagerank computing. Internet Math., 3(1):41-62, 2006. URL: https://doi.org/10.1080/15427951.2006.10129116.
  12. Aleksandar Bojchevski, Johannes Klicpera, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, and Stephan Günnemann. Scaling graph neural networks with approximate pagerank. In Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 2464-2473, 2020. URL: https://doi.org/10.1145/3394486.3403296.
  13. Béla Bollobás, Christian Borgs, Jennifer T. Chayes, and Oliver Riordan. Directed scale-free graphs. In Proc. ACM-SIAM Symp. Discrete Algorithms, pages 132-139, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644133.
  14. Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw., 30(1-7):107-117, 1998. URL: https://doi.org/10.1016/S0169-7552(98)00110-X.
  15. Mustafa Coşkun, Ananth Grama, and Mehmet Koyutürk. Efficient processing of network proximity queries via chebyshev acceleration. In Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1515-1524, 2016. URL: https://doi.org/10.1145/2939672.2939828.
  16. Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Math., 2(3):333-358, 2005. URL: https://doi.org/10.1080/15427951.2005.10129104.
  17. Kimon Fountoulakis, Farbod Roosta-Khorasani, Julian Shun, Xiang Cheng, and Michael W. Mahoney. Variational perspective on local graph clustering. Math. Program., 174(1-2):553-573, 2019. URL: https://doi.org/10.1007/S10107-017-1214-8.
  18. Yasuhiro Fujiwara, Makoto Nakatsuji, Makoto Onizuka, and Masaru Kitsuregawa. Fast and exact top-k search for random walk with restart. Proc. VLDB Endowment, 5(5):442-453, 2012. URL: https://doi.org/10.14778/2140436.2140441.
  19. Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. Efficient ad-hoc search for personalized pagerank. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 445-456, 2013. URL: https://doi.org/10.1145/2463676.2463717.
  20. Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. Efficient personalized pagerank with accuracy assurance. In Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 15-23, 2012. URL: https://doi.org/10.1145/2339530.2339538.
  21. David F. Gleich. Pagerank beyond the web. SIAM Rev., 57(3):321-363, 2015. URL: https://doi.org/10.1137/140976649.
  22. Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. Distributed algorithms on exact personalized pagerank. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 479-494, 2017. URL: https://doi.org/10.1145/3035918.3035920.
  23. Wentian Guo, Yuchen Li, Mo Sha, and Kian-Lee Tan. Parallel personalized pagerank on dynamic graphs. Proc. VLDB Endowment, 11(1):93-106, 2017. URL: https://doi.org/10.14778/3151113.3151121.
  24. Guanhao Hou, Xingguang Chen, Sibo Wang, and Zhewei Wei. Massively parallel algorithms for personalized pagerank. Proc. VLDB Endowment, 14(9):1668-1680, 2021. URL: https://doi.org/10.14778/3461535.3461554.
  25. Mark Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theor. Comput. Sci., 43:169-188, 1986. URL: https://doi.org/10.1016/0304-3975(86)90174-X.
  26. Jinhong Jung, Namyong Park, Lee Sael, and U Kang. Bepi: Fast and memory-efficient method for billion-scale random walk with restart. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 789-804, 2017. URL: https://doi.org/10.1145/3035918.3035950.
  27. Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. Predict then propagate: Graph neural networks meet personalized pagerank. In Proc. 7th Int. Conf. Learn. Representations, 2019. URL: https://openreview.net/forum?id=H1gL-2A9Ym.
  28. Meihao Liao, Rong-Hua Li, Qiangqiang Dai, Hongyang Chen, Hongchao Qin, and Guoren Wang. Efficient personalized pagerank computation: The power of variance-reduced monte carlo approaches. Proc. ACM Manage. Data, 1(2):160:1-160:26, 2023. URL: https://doi.org/10.1145/3589305.
  29. Meihao Liao, Rong-Hua Li, Qiangqiang Dai, and Guoren Wang. Efficient personalized pagerank computation: A spanning forests sampling based approach. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 2048-2061, 2022. URL: https://doi.org/10.1145/3514221.3526140.
  30. Dandan Lin, Raymond Chi-Wing Wong, Min Xie, and Victor Junqiu Wei. Index-free approach with theoretical guarantee for efficient random walk with restart query. In Proc. 36th Int. Conf. Data Eng., pages 913-924, 2020. URL: https://doi.org/10.1109/ICDE48307.2020.00084.
  31. Wenqing Lin. Distributed algorithms for fully personalized pagerank on large graphs. In Proc. Int. Conf. World Wide Web, pages 1084-1094, 2019. URL: https://doi.org/10.1145/3308558.3313555.
  32. Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank estimation and search: A bidirectional approach. In Proc. 9th ACM Int. Conf. Web Search Data Mining, pages 163-172, 2016. URL: https://doi.org/10.1145/2835776.2835823.
  33. Peter Lofgren, Siddhartha Banerjee, Ashish Goel, and Seshadhri Comandur. Fast-ppr: scaling personalized pagerank estimation for large graphs. In Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1436-1445, 2014. URL: https://doi.org/10.1145/2623330.2623745.
  34. Peter Lofgren and Ashish Goel. Personalized pagerank to a target node. CoRR, abs/1304.4658, 2013. URL: https://doi.org/10.48550/arXiv.1304.4658.
  35. Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. Computing personalized pagerank quickly by exploiting graph structures. Proc. VLDB Endowment, 7(12):1023-1034, 2014. URL: https://doi.org/10.14778/2732977.2732978.
  36. Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. Efficient pagerank tracking in evolving networks. In Proc. 21st ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 875-884, 2015. URL: https://doi.org/10.1145/2783258.2783297.
  37. Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity preserving graph embedding. In Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1105-1114, 2016. URL: https://doi.org/10.1145/2939672.2939751.
  38. Jieming Shi, Renchi Yang, Tianyuan Jin, Xiaokui Xiao, and Yin Yang. Realtime top-k personalized pagerank over large graphs on gpus. Proc. VLDB Endowment, 13(1):15-28, 2019. URL: https://doi.org/10.14778/3357377.3357379.
  39. Kijung Shin, Jinhong Jung, Lee Sael, and U Kang. Bear: Block elimination approach for random walk with restart on large graphs. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 1571-1585, 2015. URL: https://doi.org/10.1145/2723372.2723716.
  40. Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. Verse: Versatile graph embeddings from similarity measures. In Proc. Int. Conf. World Wide Web, pages 539-548, 2018. URL: https://doi.org/10.1145/3178876.3186120.
  41. Alastair J Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 8(10):127-128, 1974. URL: https://doi.org/10.1049/el:19740097.
  42. Hanzhi Wang, Mingguo He, Zhewei Wei, Sibo Wang, Ye Yuan, Xiaoyong Du, and Ji-Rong Wen. Approximate graph propagation. In Proc. 27th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1686-1696, 2021. URL: https://doi.org/10.1145/3447548.3467243.
  43. Hanzhi Wang, Zhewei Wei, Junhao Gan, Sibo Wang, and Zengfeng Huang. Personalized pagerank to a target node, revisited. In Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 657-667, 2020. URL: https://doi.org/10.1145/3394486.3403108.
  44. Runhui Wang, Sibo Wang, and Xiaofang Zhou. Parallelizing approximate single-source personalized pagerank queries on shared memory. VLDB J., 28(6):923-940, 2019. URL: https://doi.org/10.1007/S00778-019-00576-7.
  45. Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. Hubppr: Effective indexing for approximate personalized pagerank. Proc. VLDB Endowment, 10(3):205-216, 2016. URL: https://doi.org/10.14778/3021924.3021936.
  46. Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Yin Yang, and Nan Tang. Efficient algorithms for approximate single-source personalized pagerank queries. ACM Trans. Database Syst., 44(4):18:1-18:37, 2019. URL: https://doi.org/10.1145/3360902.
  47. Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. Fora: Simple and effective approximate single-source personalized pagerank. In Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 505-514, 2017. URL: https://doi.org/10.1145/3097983.3098072.
  48. Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Yu Liu, Xiaoyong Du, and Ji-Rong Wen. Prsim: Sublinear time simrank computation on large power-law graphs. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 1042-1059, 2019. URL: https://doi.org/10.1145/3299869.3319873.
  49. Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. Topppr: Top-k personalized pagerank queries with precision guarantees on large graphs. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 441-456, 2018. URL: https://doi.org/10.1145/3183713.3196920.
  50. Zhewei Wei, Ji-Rong Wen, and Mingji Yang. Approximating single-source personalized pagerank with absolute error guarantees. CoRR, abs/2401.01019, 2024. URL: https://doi.org/10.48550/arXiv.2401.01019.
  51. Hao Wu, Junhao Gan, Zhewei Wei, and Rui Zhang. Unifying the global and local approaches: An efficient power iteration with forward push. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 1996-2008, 2021. URL: https://doi.org/10.1145/3448016.3457298.
  52. Yubao Wu, Ruoming Jin, and Xiang Zhang. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In Proc. ACM SIGMOD Int. Conf. Manage. Data, pages 1139-1150, 2014. URL: https://doi.org/10.1145/2588555.2610500.
  53. Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. Local higher-order graph clustering. In Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 555-564, 2017. URL: https://doi.org/10.1145/3097983.3098069.
  54. Yuan Yin and Zhewei Wei. Scalable graph embeddings via sparse transpose proximities. In Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pages 1429-1437, 2019. URL: https://doi.org/10.1145/3292500.3330860.
  55. Minji Yoon, Woojeong Jin, and U Kang. Fast and accurate random walk with restart on dynamic graphs with guarantees. In Proc. Int. Conf. World Wide Web, pages 409-418, 2018. URL: https://doi.org/10.1145/3178876.3186107.
  56. Minji Yoon, Jinhong Jung, and U Kang. TPA: Fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In Proc. 34th Int. Conf. Data Eng., pages 1132-1143, 2018. URL: https://doi.org/10.1109/ICDE.2018.00105.
  57. Weiren Yu and Xuemin Lin. IRWR: Incremental random walk with restart. In Proc. 36th ACM SIGIR Int. Conf. Res. Develop. Inf. Retrieval, pages 1017-1020, 2013. URL: https://doi.org/10.1145/2484028.2484114.
  58. Weiren Yu and Julie A. McCann. Random walk with restart over dynamic graphs. In Proc. 16th Int. Conf. Data Mining, pages 589-598, 2016. URL: https://doi.org/10.1109/ICDM.2016.0070.
  59. Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. Incremental and accuracy-aware personalized pagerank through scheduled approximation. Proc. VLDB Endowment, 6(6):481-492, 2013. URL: https://doi.org/10.14778/2536336.2536348.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail