Asymptotically Optimal Bounds for Estimating H-Index in Sublinear Time with Applications to Subgraph Counting

Authors Sepehr Assadi, Hoai-An Nguyen



PDF
Thumbnail PDF

File

LIPIcs.APPROX-RANDOM.2022.48.pdf
  • Filesize: 1.2 MB
  • 20 pages

Document Identifiers

Author Details

Sepehr Assadi
  • Department of Computer Science, Rutgers University, Piscataway, NJ, USA
Hoai-An Nguyen
  • Department of Computer Science, Rutgers University, Piscataway, NJ, USA

Acknowledgements

We thank Janani Sundaresan for helpful feedback on the presentation of our paper. We are also grateful to the anonymous reviewers of APPROX 2022 for their helpful feedback on previous work and the presentation of this paper.

Cite AsGet BibTex

Sepehr Assadi and Hoai-An Nguyen. Asymptotically Optimal Bounds for Estimating H-Index in Sublinear Time with Applications to Subgraph Counting. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 245, pp. 48:1-48:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2022.48

Abstract

The h-index is a metric used to measure the impact of a user in a publication setting, such as a member of a social network with many highly liked posts or a researcher in an academic domain with many highly cited publications. Specifically, the h-index of a user is the largest integer h such that at least h publications of the user have at least h units of positive feedback. We design an algorithm that, given query access to the n publications of a user and each publication’s corresponding positive feedback number, outputs a (1± ε)-approximation of the h-index of this user with probability at least 1-δ in time O(n⋅ln(1/δ) / (ε²⋅h)), where h is the actual h-index which is unknown to the algorithm a-priori. We then design a novel lower bound technique that allows us to prove that this bound is in fact asymptotically optimal for this problem in all parameters n,h,ε, and δ. Our work is one of the first in sublinear time algorithms that addresses obtaining asymptotically optimal bounds, especially in terms of the error and confidence parameters. As such, we focus on designing novel techniques for this task. In particular, our lower bound technique seems quite general - to showcase this, we also use our approach to prove an asymptotically optimal lower bound for the problem of estimating the number of triangles in a graph in sublinear time, which now is also optimal in the error and confidence parameters. This latter result improves upon prior lower bounds of Eden, Levi, Ron, and Seshadhri (FOCS'15) for this problem, as well as multiple follow-up works that extended this lower bound to other subgraph counting problems.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
Keywords
  • Sublinear time algorithms
  • h-index
  • asymptotically optimal bounds
  • lower bounds
  • subgraph counting

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling. In Avrim Blum, editor, 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, volume 124 of LIPIcs, pages 6:1-6:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar
  2. Sepehr Assadi and Vihan Shah. An asymptotically optimal algorithm for maximum matching in dynamic streams. In Mark Braverman, editor, 13th Innovations in Theoretical Computer Science Conference, ITCS 2022, January 31 - February 3, 2022, Berkeley, CA, USA, volume 215 of LIPIcs, pages 9:1-9:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2022. Google Scholar
  3. Sayan Bhattacharya, Fabrizio Grandoni, Janardhan Kulkarni, Quanquan C. Liu, and Shay Solomon. Fully dynamic (Δ +1)-coloring in O(1) update time. ACM Trans. Algorithms, 18(2):10:1-10:25, 2022. Google Scholar
  4. Arijit Bishnu, Arijit Ghosh, Gopinath Mishra, and Manaswi Paraashar. Query complexity of global minimum cut. In Mary Wootters and Laura Sanità, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2021, August 16-18, 2021, University of Washington, Seattle, Washington, USA (Virtual Conference), volume 207 of LIPIcs, pages 6:1-6:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  5. Vladimir Braverman, Jonathan Katzman, Charles Seidell, and Gregory Vorsanger. An optimal algorithm for large frequency moments using o(n^(1-2/k)) bits. In Klaus Jansen, José D. P. Rolim, Nikhil R. Devanur, and Cristopher Moore, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2014, September 4-6, 2014, Barcelona, Spain, volume 28 of LIPIcs, pages 531-544. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2014. Google Scholar
  6. Harry Buhrman and Ronald de Wolf. Complexity measures and decision tree complexity: a survey. Theor. Comput. Sci., 288(1):21-43, 2002. Google Scholar
  7. Devdatt P. Dubhashi and Alessandro Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009. Google Scholar
  8. Talya Eden, Shweta Jain, Ali Pinar, Dana Ron, and C. Seshadhri. Provable and practical approximations for the degree distribution using sublinear graph samples. CoRR, abs/1710.08607, 2017. URL: http://arxiv.org/abs/1710.08607.
  9. Talya Eden, Amit Levi, Dana Ron, and C. Seshadhri. Approximately counting triangles in sublinear time. In Venkatesan Guruswami, editor, IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 614-633. IEEE Computer Society, 2015. Google Scholar
  10. Talya Eden, Saleet Mossel, and Ronitt Rubinfeld. Sampling multiple edges efficiently. In Mary Wootters and Laura Sanità, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2021, August 16-18, 2021, University of Washington, Seattle, Washington, USA (Virtual Conference), volume 207 of LIPIcs, pages 51:1-51:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  11. Talya Eden, Dana Ron, and C. Seshadhri. On approximating the number of k-cliques in sublinear time. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, Los Angeles, CA, USA, June 25-29, 2018, pages 722-734. ACM, 2018. Google Scholar
  12. Talya Eden, Dana Ron, and C. Seshadhri. Faster sublinear approximation of the number of k-cliques in low-arboricity graphs. In Shuchi Chawla, editor, Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, pages 1467-1478. SIAM, 2020. Google Scholar
  13. Talya Eden and Will Rosenbaum. Lower bounds for approximating graph parameters via communication complexity. In Eric Blais, Klaus Jansen, José D. P. Rolim, and David Steurer, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2018, August 20-22, 2018 - Princeton, NJ, USA, volume 116 of LIPIcs, pages 11:1-11:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. Google Scholar
  14. Talya Eden and Will Rosenbaum. On sampling edges almost uniformly. In Raimund Seidel, editor, 1st Symposium on Simplicity in Algorithms, SOSA 2018, January 7-10, 2018, New Orleans, LA, USA, volume 61 of OASIcs, pages 7:1-7:9. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. Google Scholar
  15. Hendrik Fichtenberger, Mingze Gao, and Pan Peng. Sampling arbitrary subgraphs exactly uniformly in sublinear time. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), volume 168 of LIPIcs, pages 45:1-45:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. Google Scholar
  16. Alison L Gibbs and Francis Edward Su. On choosing and bounding probability metrics. International statistical review, 70(3):419-435, 2002. Google Scholar
  17. Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017. Google Scholar
  18. Priya Govindan, Morteza Monemizadeh, and S. Muthukrishnan. Streaming algorithms for measuring h-impact. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS '17, pages 337-346, New York, NY, USA, 2017. Association for Computing Machinery. URL: https://doi.org/10.1145/3034786.3056118.
  19. Monika Henzinger and Pan Peng. Constant-time dynamic (Δ+1)-coloring. In Christophe Paul and Markus Bläser, editors, 37th International Symposium on Theoretical Aspects of Computer Science, STACS 2020, March 10-13, 2020, Montpellier, France, volume 154 of LIPIcs, pages 53:1-53:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. Google Scholar
  20. Jorge E. Hirsch. An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. USA, 102(46):16569-16572, 2005. Google Scholar
  21. Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Jan Paredaens and Dirk Van Gucht, editors, Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, June 6-11, 2010, Indianapolis, Indiana, USA, pages 41-52. ACM, 2010. Google Scholar
  22. Michael Kapralov, Jelani Nelson, Jakub Pachocki, Zhengyu Wang, David P. Woodruff, and Mobin Yahyazadeh. Optimal lower bounds for universal relation, and for samplers and finding duplicates in streams. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 475-486. IEEE Computer Society, 2017. Google Scholar
  23. Yi Li and David P. Woodruff. A tight lower bound for high frequency moment estimation with small error. In Prasad Raghavendra, Sofya Raskhodnikova, Klaus Jansen, and José D. P. Rolim, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 16th International Workshop, APPROX 2013, and 17th International Workshop, RANDOM 2013, Berkeley, CA, USA, August 21-23, 2013. Proceedings, volume 8096 of Lecture Notes in Computer Science, pages 623-638. Springer, 2013. Google Scholar
  24. Linyuan Lü, Tao Zhou, Qian-Ming Zhang, and H. Eugene Stanley. The H-index of a network node and its relation to degree and coreness. Nature Communications, 7(1):1-7, April 2016. URL: https://doi.org/10.1038/ncomms10168.
  25. Jelani Nelson and Huacheng Yu. Optimal lower bounds for distributed and streaming spanning forest computation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, pages 1844-1860, 2019. Google Scholar
  26. Eric Price and David P. Woodruff. (1 + eps)-approximate sparse recovery. In Rafail Ostrovsky, editor, IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS 2011, Palm Springs, CA, USA, October 22-25, 2011, pages 295-304. IEEE Computer Society, 2011. Google Scholar
  27. Eric Price and David P. Woodruff. Lower bounds for adaptive sparse recovery. In Sanjeev Khanna, editor, Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6-8, 2013, pages 652-663. SIAM, 2013. Google Scholar
  28. Fabián Riquelme and Pablo Gonzalez Cantergiani. Measuring user influence on twitter: A survey. Inf. Process. Manag., 52(5):949-975, 2016. Google Scholar
  29. Ahmet Erdem Sariyüce, C. Seshadhri, and Ali Pinar. Local algorithms for hierarchical dense subgraph discovery. Proc. VLDB Endow., 12(1):43-56, 2018. URL: https://doi.org/10.14778/3275536.3275540.
  30. Shay Solomon. Fully dynamic maximal matching in constant update time. In Irit Dinur, editor, IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pages 325-334. IEEE Computer Society, 2016. Google Scholar
  31. Jakub Tětek and Mikkel Thorup. Sampling and counting edges via vertex accesses. arXiv preprint arXiv:2107.03821. To appear in STOC 2022, 2021. Google Scholar
  32. Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer series in statistics. Springer, 2009. URL: https://doi.org/10.1007/b13794.
  33. Andrew Chi-Chih Yao. Probabilistic computations: Toward a unified measure of complexity (extended abstract). In 18th Annual Symposium on Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1 November 1977, pages 222-227. IEEE Computer Society, 1977. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail