Join Sampling Under Acyclic Degree Constraints and (Cyclic) Subgraph Sampling

Authors Ru Wang, Yufei Tao



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2024.23.pdf
  • Filesize: 0.8 MB
  • 20 pages

Document Identifiers

Author Details

Ru Wang
  • The Chinese University of Hong Kong, China
Yufei Tao
  • The Chinese University of Hong Kong, China

Cite AsGet BibTex

Ru Wang and Yufei Tao. Join Sampling Under Acyclic Degree Constraints and (Cyclic) Subgraph Sampling. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 23:1-23:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ICDT.2024.23

Abstract

Given a (natural) join with an acyclic set of degree constraints (the join itself does not need to be acyclic), we show how to draw a uniformly random sample from the join result in O(polymat/max{1, OUT}) expected time (assuming data complexity) after a preprocessing phase of O(IN) expected time, where IN, OUT, and polymat are the join’s input size, output size, and polymatroid bound, respectively. This compares favorably with the state of the art (Deng et al. and Kim et al., both in PODS'23), which states that, in the absence of degree constraints, a uniformly random sample can be drawn in Õ(AGM/max{1, OUT}) expected time after a preprocessing phase of Õ(IN) expected time, where AGM is the join’s AGM bound and Õ(.) hides a polylog(IN) factor. Our algorithm applies to every join supported by the solutions of Deng et al. and Kim et al. Furthermore, since the polymatroid bound is at most the AGM bound, our performance guarantees are never worse, but can be considerably better, than those of Deng et al. and Kim et al. We then utilize our techniques to tackle directed subgraph sampling, a problem that has extensive database applications and bears close relevance to joins. Let G = (V, E) be a directed data graph where each vertex has an out-degree at most λ, and let P be a directed pattern graph with a constant number of vertices. The objective is to uniformly sample an occurrence of P in G. The problem can be modeled as join sampling with input size IN = Θ(|E|) but, whenever P contains cycles, the converted join has cyclic degree constraints. We show that it is always possible to throw away certain degree constraints such that (i) the remaining constraints are acyclic and (ii) the new join has asymptotically the same polymatroid bound polymat as the old one. Combining this finding with our new join sampling solution yields an algorithm to sample from the original (cyclic) join (thereby yielding a uniformly random occurrence of P) in O(polymat/max{1, OUT}) expected time after O(|E|) expected-time preprocessing, where OUT is the number of occurrences.

Subject Classification

ACM Subject Classification
  • Theory of computation → Graph algorithms analysis
  • Information systems → Join algorithms
Keywords
  • Join Sampling
  • Subgraph Sampling
  • Degree Constraints
  • Polymatroid Bounds

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Amir Abboud, Seri Khoury, Oree Leibowitz, and Ron Safier. Listing 4-cycles. CoRR, abs/2211.10022, 2022. URL: https://doi.org/10.48550/arXiv.2211.10022.
  2. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
  3. Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. Join synopses for approximate query answering. In Proceedings of ACM Management of Data (SIGMOD), pages 275-286, 1999. URL: https://doi.org/10.1145/304182.304207.
  4. Noga Alon. On the number of subgraphs of prescribed type of graphs with a given number of edges. Israel Journal of Mathematics, 38:116-130, 1981. Google Scholar
  5. Kaleb Alway, Eric Blais, and Semih Salihoglu. Box covers and domain orderings for beyond worst-case join processing. In Proceedings of International Conference on Database Theory (ICDT), pages 3:1-3:23, 2021. URL: https://doi.org/10.4230/LIPIcs.ICDT.2021.3.
  6. Albert Atserias, Martin Grohe, and Daniel Marx. Size bounds and query plans for relational joins. SIAM Journal on Computing, 42(4):1737-1767, 2013. URL: https://doi.org/10.1137/110859440.
  7. Matthias Bentert, Till Fluschnik, Andre Nichterlein, and Rolf Niedermeier. Parameterized aspects of triangle enumeration. Journal of Computer and System Sciences (JCSS), 103:61-77, 2019. URL: https://doi.org/10.1016/j.jcss.2019.02.004.
  8. Andreas Bjorklund, Rasmus Pagh, Virginia Vassilevska Williams, and Uri Zwick. Listing triangles. In Proceedings of International Colloquium on Automata, Languages and Programming (ICALP), pages 223-234, 2014. URL: https://doi.org/10.1007/978-3-662-43948-7_19.
  9. Surajit Chaudhuri, Rajeev Motwani, and Vivek R. Narasayya. On random sampling over joins. In Proceedings of ACM Management of Data (SIGMOD), pages 263-274, 1999. URL: https://doi.org/10.1145/304182.304206.
  10. Yu Chen and Ke Yi. Random sampling and size estimation over cyclic joins. In Proceedings of International Conference on Database Theory (ICDT), pages 7:1-7:18, 2020. URL: https://doi.org/10.4230/LIPIcs.ICDT.2020.7.
  11. N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal of Computing, 14(1):210-223, 1985. URL: https://doi.org/10.1137/0214017.
  12. Kyle Deeds, Dan Suciu, Magda Balazinska, and Walter Cai. Degree sequence bound for join cardinality estimation. In Proceedings of International Conference on Database Theory (ICDT), volume 255, pages 8:1-8:18, 2023. URL: https://doi.org/10.4230/LIPIcs.ICDT.2023.8.
  13. Shiyuan Deng, Shangqi Lu, and Yufei Tao. On join sampling and the hardness of combinatorial output-sensitive join algorithms. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 99-111, 2023. URL: https://doi.org/10.1145/3584372.3588666.
  14. David Eppstein. Subgraph isomorphism in planar graphs and related problems. J. Graph Algorithms Appl., 3(3):1-27, 1999. URL: https://doi.org/10.7155/jgaa.00014.
  15. Hendrik Fichtenberger, Mingze Gao, and Pan Peng. Sampling arbitrary subgraphs exactly uniformly in sublinear time. In Proceedings of International Colloquium on Automata, Languages and Programming (ICALP), pages 45:1-45:13, 2020. URL: https://doi.org/10.4230/LIPIcs.ICALP.2020.45.
  16. Tomasz Gogacz and Szymon Torunczyk. Entropy bounds for conjunctive queries with functional dependencies. In Proceedings of International Conference on Database Theory (ICDT), volume 68, pages 15:1-15:17, 2017. URL: https://doi.org/10.4230/LIPIcs.ICDT.2017.15.
  17. Chinh T. Hoang, Marcin Kaminski, Joe Sawada, and R. Sritharan. Finding and listing induced paths and cycles. Discrete Applied Mathematics, 161(4-5):633-641, 2013. URL: https://doi.org/10.1016/j.dam.2012.01.024.
  18. Sai Vikneshwar Mani Jayaraman, Corey Ropell, and Atri Rudra. Worst-case optimal binary join algorithms under general 𝓁_p constraints. CoRR, abs/2112.01003, 2021. URL: https://doi.org/10.48550/arXiv.2112.01003.
  19. Ce Jin and Yinzhan Xu. Removing additive structure in 3sum-based reductions. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 405-418, 2023. URL: https://doi.org/10.1145/3564246.3585157.
  20. Manas Joglekar and Christopher Re. It’s all a matter of degree - using degree information to optimize multiway joins. Theory Comput. Syst., 62(4):810-853, 2018. URL: https://doi.org/10.1007/s00224-017-9811-8.
  21. Mahmoud Abo Khamis, Vasileios Nakos, Dan Olteanu, and Dan Suciu. Join size bounds using lp-norms on degree sequences. CoRR, abs/2306.14075, 2023. URL: https://doi.org/10.48550/arXiv.2306.14075.
  22. Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Re, and Atri Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Transactions on Database Systems (TODS), 41(4):22:1-22:45, 2016. URL: https://doi.org/10.1145/2967101.
  23. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. Computing join queries with functional dependencies. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 327-342, 2016. URL: https://doi.org/10.1145/2902251.2902289.
  24. Mahmoud Abo Khamis, Hung Q. Ngo, and Dan Suciu. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 429-444, 2017. URL: https://doi.org/10.1145/3034786.3056105.
  25. Kyoungmin Kim, Jaehyun Ha, George Fletcher, and Wook-Shin Han. Guaranteeing the Õ(AGM/OUT) runtime for uniform sampling and size estimation over joins. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 113-125, 2023. URL: https://doi.org/10.1145/3584372.3588676.
  26. George Manoussakis. Listing all fixed-length simple cycles in sparse graphs in optimal time. In Fundamentals of Computation Theory, pages 355-366, 2017. URL: https://doi.org/10.1007/978-3-662-55751-8_28.
  27. Gonzalo Navarro, Juan L. Reutter, and Javiel Rojas-Ledesma. Optimal joins using compact data structures. In Proceedings of International Conference on Database Theory (ICDT), volume 155, pages 21:1-21:21, 2020. URL: https://doi.org/10.4230/LIPIcs.ICDT.2020.21.
  28. Jaroslav Nesetril and Svatopluk Poljak. On the complexity of the subgraph problem. Commentationes Mathematicae Universitatis Carolinae, 26(2):415-419, 1985. Google Scholar
  29. Hung Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 111-124, 2018. URL: https://doi.org/10.1145/3196959.3196990.
  30. Hung Q. Ngo, Dung T. Nguyen, Christopher Re, and Atri Rudra. Beyond worst-case analysis for joins with minesweeper. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 234-245, 2014. URL: https://doi.org/10.1145/2594538.2594547.
  31. Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. Worst-Case Optimal Join Algorithms: [Extended Abstract]. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 37-48, 2012. URL: https://doi.org/10.1145/2213556.2213565.
  32. Hung Q. Ngo, Ely Porat, Christopher Re, and Atri Rudra. Worst-case optimal join algorithms. Journal of the ACM (JACM), 65(3):16:1-16:40, 2018. URL: https://doi.org/10.1145/3180143.
  33. Hung Q. Ngo, Christopher Re, and Atri Rudra. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec., 42(4):5-16, 2013. URL: https://doi.org/10.1145/2590989.2590991.
  34. Dan Suciu. Applications of information inequalities to database theory problems. CoRR, abs/2304.11996, 2023. URL: https://doi.org/10.48550/arXiv.2304.11996.
  35. Maciej M. Syslo. An efficient cycle vector space algorithm for listing all cycles of a planar graph. SIAM Journal of Computing, 10(4):797-808, 1981. URL: https://doi.org/10.1137/0210062.
  36. Todd L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In Proceedings of International Conference on Database Theory (ICDT), pages 96-106, 2014. URL: https://doi.org/10.5441/002/icdt.2014.13.
  37. Ru Wang and Yufei Tao. Join sampling under acyclic degree constraints and (cyclic) subgraph sampling, 2023. URL: https://doi.org/10.48550/arXiv.2312.12797.
  38. Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, and Ke Yi. Random sampling over joins revisited. In Proceedings of ACM Management of Data (SIGMOD), pages 1525-1539, 2018. URL: https://doi.org/10.1145/3183713.3183739.