Subgraph Enumeration in Optimal I/O Complexity

Authors Shiyuan Deng, Yufei Tao



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2024.21.pdf
  • Filesize: 0.9 MB
  • 20 pages

Document Identifiers

Author Details

Shiyuan Deng
  • The Chinese University of Hong Kong, China
Yufei Tao
  • The Chinese University of Hong Kong, China

Cite AsGet BibTex

Shiyuan Deng and Yufei Tao. Subgraph Enumeration in Optimal I/O Complexity. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICDT.2024.21

Abstract

Given a massive data graph G = (V, E) and a small pattern graph Q, the goal of subgraph enumeration is to list all the subgraphs of G isomorphic to Q. In the external memory (EM) model, it is well-known that every indivisible algorithm must perform Ω({|E|^ρ}/{M^{ρ-1} B}) I/Os in the worst case, where M represents the number of words in (internal) memory, B denotes the number of words in a disk block, and ρ is the fractional edge covering number of Q. It has been a longstanding open problem to design an algorithm to match this lower bound. The state of the art is an algorithm in ICDT'23 that achieves an I/O complexity of O({|E|^ρ}/{M^{ρ-1} B} log_{M/B} |E|/B) with high probability. In this paper, we remove the log_{M/B} |E|/B factor, thereby settling the open problem when randomization is permitted.

Subject Classification

ACM Subject Classification
  • Theory of computation → Graph algorithms analysis
  • Information systems → Join algorithms
Keywords
  • Subgraph Enumeration
  • Conjunctive Queries
  • External Memory
  • Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM (CACM), 31(9):1116-1127, 1988. URL: https://doi.org/10.1145/48529.48535.
  2. Noga Alon, Raphael Yuster, and Uri Zwick. Color-coding. Journal of the ACM (JACM), 42(4):844-856, 1995. URL: https://doi.org/10.1145/210332.210337.
  3. Noga Alon, Raphael Yuster, and Uri Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209-223, 1997. URL: https://doi.org/10.1007/BF02523189.
  4. Kaleb Alway, Eric Blais, and Semih Salihoglu. Box covers and domain orderings for beyond worst-case join processing. In Proceedings of International Conference on Database Theory (ICDT), pages 3:1-3:23, 2021. URL: https://doi.org/10.4230/LIPIcs.ICDT.2021.3.
  5. Suman K. Bera, Noujan Pashanasangi, and C. Seshadhri. Near-linear time homomorphism counting in bounded degeneracy graphs: The barrier of long induced cycles. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2315-2332, 2021. URL: https://doi.org/10.1137/1.9781611976465.138.
  6. Andreas Bjorklund, Petteri Kaski, and Lukasz Kowalik. Counting thin subgraphs via packings faster than meet-in-the-middle time. ACM Transactions on Algorithms, 13(4):48:1-48:26, 2017. URL: https://doi.org/10.1145/3125500.
  7. Andreas Bjorklund, Rasmus Pagh, Virginia Vassilevska Williams, and Uri Zwick. Listing triangles. In Proceedings of International Colloquium on Automata, Languages and Programming (ICALP), pages 223-234, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_19.
  8. N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal of Computing, 14(1):210-223, 1985. URL: https://doi.org/10.1137/0214017.
  9. Stephen A. Cook. The complexity of theorem-proving procedures. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 151-158, 1971. URL: https://doi.org/10.1145/800157.805047.
  10. Radu Curticapean, Holger Dell, and Dániel Marx. Homomorphisms are a good basis for counting small subgraphs. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 210-223, 2017. URL: https://doi.org/10.1145/3055399.3055502.
  11. Shiyuan Deng, Shangqi Lu, and Yufei Tao. On join sampling and the hardness of combinatorial output-sensitive join algorithms. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 99-111, 2023. URL: https://doi.org/10.1145/3584372.3588666.
  12. Shiyuan Deng, Francesco Silvestri, and Yufei Tao. Enumerating subgraphs of constant sizes in external memory. In Proceedings of International Conference on Database Theory (ICDT), pages 4:1-4:20, 2023. URL: https://doi.org/10.4230/LIPIcs.ICDT.2023.4.
  13. David Eppstein. Arboricity and bipartite subgraph listing algorithms. Information Processing Letters (IPL), 51(4):207-211, 1994. URL: https://doi.org/10.1016/0020-0190(94)90121-X.
  14. David Eppstein. Subgraph isomorphism in planar graphs and related problems. J. Graph Algorithms Appl., 3(3):1-27, 1999. URL: https://doi.org/10.7155/jgaa.00014.
  15. David Eppstein, Maarten Loffler, and Darren Strash. Listing all maximal cliques in sparse graphs in near-optimal time. In International Symposium on Algorithms and Computation (ISAAC), volume 6506, pages 403-414, 2010. URL: https://doi.org/10.1007/978-3-642-17517-6_36.
  16. Peter Floderus, Miroslaw Kowaluk, Andrzej Lingas, and Eva-Marta Lundell. Detecting and counting small pattern graphs. SIAM J. Discret. Math., 29(3):1322-1339, 2015. URL: https://doi.org/10.1137/140978211.
  17. Fedor V. Fomin, Daniel Lokshtanov, Venkatesh Raman, Saket Saurabh, and B. V. Raghavendra Rao. Faster algorithms for finding and counting subgraphs. Journal of Computer and System Sciences (JCSS), 78(3):698-706, 2012. URL: https://doi.org/10.1016/j.jcss.2011.10.001.
  18. Pierre-Louis Giscard, Nils M. Kriege, and Richard C. Wilson. A general purpose algorithm for counting simple cycles and simple paths of any length. Algorithmica, 81(7):2716-2737, 2019. URL: https://doi.org/10.1007/s00453-019-00552-1.
  19. Chinh T. Hoang, Marcin Kaminski, Joe Sawada, and R. Sritharan. Finding and listing induced paths and cycles. Discrete Applied Mathematics, 161(4-5):633-641, 2013. URL: https://doi.org/10.1016/j.dam.2012.01.024.
  20. Xiao Hu and Ke Yi. Towards a worst-case I/O-optimal algorithm for acyclic joins. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 135-150, 2016. URL: https://doi.org/10.1145/2902251.2902292.
  21. Xiaocheng Hu, Miao Qiao, and Yufei Tao. External memory stream sampling. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 229-239, 2015. URL: https://doi.org/10.1145/2745754.2745757.
  22. Xiaocheng Hu, Miao Qiao, and Yufei Tao. Join dependency testing, loomis-whitney join, and triangle enumeration. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 291-301, 2015. URL: https://doi.org/10.1145/2745754.2745768.
  23. Xiaocheng Hu, Miao Qiao, and Yufei Tao. I/O-efficient join dependency testing, loomis-whitney join, and triangle enumeration. Journal of Computer and System Sciences (JCSS), 82(8):1300-1315, 2016. URL: https://doi.org/10.1016/j.jcss.2016.05.005.
  24. Manas Joglekar and Christopher Re. It’s all a matter of degree - using degree information to optimize multiway joins. Theory Comput. Syst., 62(4):810-853, 2018. URL: https://doi.org/10.1007/s00224-017-9811-8.
  25. Bas Ketsman and Dan Suciu. A worst-case optimal multi-round algorithm for parallel computation of conjunctive queries. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 417-428, 2017. URL: https://doi.org/10.1145/3034786.3034788.
  26. Bas Ketsman, Dan Suciu, and Yufei Tao. A near-optimal parallel algorithm for joining binary relations. Log. Methods Comput. Sci., 18(2), 2022. URL: https://doi.org/10.46298/lmcs-18(2:6)2022.
  27. Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Ré, and Atri Rudra. Joins via geometric resolutions: Worst-case and beyond. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 213-228, 2015. URL: https://doi.org/10.1145/2745754.2745776.
  28. Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Re, and Atri Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Transactions on Database Systems (TODS), 41(4):22:1-22:45, 2016. URL: https://doi.org/10.1145/2967101.
  29. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 429-444, 2017. URL: https://doi.org/10.1145/3034786.3056105.
  30. Ton Kloks, Dieter Kratsch, and Haiko Müller. Finding and counting small induced subgraphs efficiently. Information Processing Letters (IPL), 74(3-4):115-121, 2000. URL: https://doi.org/10.1016/S0020-0190(00)00047-8.
  31. Gonzalo Navarro, Juan L. Reutter, and Javiel Rojas-Ledesma. Optimal joins using compact data structures. In Proceedings of International Conference on Database Theory (ICDT), volume 155, pages 21:1-21:21, 2020. URL: https://doi.org/10.4230/LIPIcs.ICDT.2020.21.
  32. Jaroslav Nesetril and Svatopluk Poljak. On the complexity of the subgraph problem. Commentationes Mathematicae Universitatis Carolinae, 26(2):415-419, 1985. Google Scholar
  33. Hung Q. Ngo, Dung T. Nguyen, Christopher Re, and Atri Rudra. Beyond worst-case analysis for joins with minesweeper. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 234-245, 2014. URL: https://doi.org/10.1145/2594538.2594547.
  34. Hung Q. Ngo, Ely Porat, Christopher Re, and Atri Rudra. Worst-case optimal join algorithms. Journal of the ACM (JACM), 65(3):16:1-16:40, 2018. URL: https://doi.org/10.1145/3180143.
  35. Hung Q. Ngo, Christopher Re, and Atri Rudra. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec., 42(4):5-16, 2013. URL: https://doi.org/10.1145/2590989.2590991.
  36. Rasmus Pagh and Francesco Silvestri. The input/output complexity of triangle enumeration. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 224-233, 2014. URL: https://doi.org/10.1145/2594538.2594552.
  37. Edward R. Scheinerman and Daniel H. Ullman. Fractional Graph Theory: A Rational Approach to the Theory of Graphs. Wiley, New York, 1997. Google Scholar
  38. Yufei Tao. A simple parallel algorithm for natural joins on binary relations. In Proceedings of International Conference on Database Theory (ICDT), pages 25:1-25:18, 2020. URL: https://doi.org/10.4230/LIPIcs.ICDT.2020.25.
  39. Todd L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In Proceedings of International Conference on Database Theory (ICDT), pages 96-106, 2014. URL: https://doi.org/10.5441/002/icdt.2014.13.
  40. Virginia Vassilevska Williams and Ryan Williams. Finding, minimizing, and counting weighted subgraphs. SIAM Journal of Computing, 42(3):831-854, 2013. URL: https://doi.org/10.1137/09076619X.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail