Finding Smallest Witnesses for Conjunctive Queries

Authors Xiao Hu , Stavros Sintos

Thumbnail PDF


  • Filesize: 1.2 MB
  • 20 pages

Document Identifiers

Author Details

Xiao Hu
  • University of Waterloo, Canada
Stavros Sintos
  • University of Illinois Chicago, IL, USA

Cite AsGet BibTex

Xiao Hu and Stavros Sintos. Finding Smallest Witnesses for Conjunctive Queries. In 27th International Conference on Database Theory (ICDT 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 290, pp. 24:1-24:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


A witness is a sub-database that preserves the query results of the original database but of much smaller size. It has wide applications in query rewriting and debugging, query explanation, IoT analytics, multi-layer network routing, etc. In this paper, we study the smallest witness problem (SWP) for the class of conjunctive queries (CQs) without self-joins. We first establish the dichotomy that SWP for a CQ can be computed in polynomial time if and only if it has head-cluster property, unless P = NP. We next turn to the approximated version by relaxing the size of a witness from being minimum. We surprisingly find that the head-domination property - that has been identified for the deletion propagation problem [Kimelfeld et al., 2012] - can also precisely capture the hardness of the approximated smallest witness problem. In polynomial time, SWP for any CQ with head-domination property can be approximated within a constant factor, while SWP for any CQ without such a property cannot be approximated within a logarithmic factor, unless P = NP. We further explore efficient approximation algorithms for CQs without head-domination property: (1) we show a trivial algorithm which achieves a polynomially large approximation ratio for general CQs; (2) for any CQ with only one non-output attribute, such as star CQs, we show a greedy algorithm with a logarithmic approximation ratio; (3) for line CQs, which contain at least two non-output attributes, we relate SWP problem to the directed steiner forest problem, whose algorithms can be applied to line CQs directly. Meanwhile, we establish a much higher lower bound, exponentially larger than the logarithmic lower bound obtained above. It remains open to close the gap between the lower and upper bound of the approximated SWP for CQs without head-domination property.

Subject Classification

ACM Subject Classification
  • Theory of computation → Data provenance
  • conjunctive query
  • smallest witness
  • head-cluster
  • head-domination


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Amir Abboud and Greg Bodwin. Reachability preservers: New extremal bounds and approximation algorithms. In SODA, pages 1865-1883. SIAM, 2018. URL:
  2. Mahmoud Abo Khamis, Hung Q Ngo, and Atri Rudra. Faq: questions asked frequently. In PODS, pages 13-28, 2016. URL:
  3. Yael Amsterdamer, Daniel Deutch, and Val Tannen. Provenance for aggregate queries. In PODS, pages 153-164, 2011. URL:
  4. Albert Atserias, Martin Grohe, and Dániel Marx. Size bounds and query plans for relational joins. In FOCS, FOCS '08, pages 739-748, 2008. URL:
  5. Guillaume Bagan, Arnaud Durand, and Etienne Grandjean. On acyclic conjunctive queries and constant delay enumeration. In CSL, pages 208-222. Springer, 2007. URL:
  6. C. Beeri, R. Fagin, D. Maier, and M. Yannakakis. On the desirability of acyclic database schemes. JACM, 30(3):479-513, 1983. URL:
  7. KORTE Bernhard and JENS Vygen. Combinatorial optimization: Theory and algorithms. Springer, Third Edition, 2005., 2008. Google Scholar
  8. Johann Brault-Baron. Hypergraph acyclicity revisited. CSUR, 49(3):1-26, 2016. URL:
  9. Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. Why and where: A characterization of data provenance. In ICDT, pages 316-330. Springer, 2001. URL:
  10. Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. On random sampling over joins. ACM SIGMOD Record, 28(2):263-274, 1999. URL:
  11. Chandra Chekuri, Guy Even, Anupam Gupta, and Danny Segev. Set connectivity problems in undirected graphs and the directed steiner network problem. TALG, 7(2):1-17, 2011. URL:
  12. Yu Chen and Ke Yi. Random sampling and size estimation over cyclic joins. In ICDT, 2020. URL:
  13. Rajesh Chitnis, Andreas Emil Feldmann, and Pasin Manurangsi. Parameterized approximation algorithms for bidirected steiner network problems. ACM Trans. Algorithms, 17(2), apr 2021. URL:
  14. Graham Cormode, Minos Garofalakis, Peter J Haas, Chris Jermaine, et al. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trendsregistered in Databases, 4(1-3):1-294, 2011. URL:
  15. Graham Cormode and Ke Yi. Small summaries for big data. Cambridge University Press, 2020. Google Scholar
  16. Irit Dinur and Pasin Manurangsi. Eth-hardness of approximating 2-csps and directed steiner network. In Anna R. Karlin, editor, ITCS, volume 94 of LIPIcs, pages 36:1-36:20, 2018. URL:
  17. Irit Dinur and David Steurer. Analytical approach to parallel repetition. In STOC, pages 624-633, 2014. URL:
  18. Yevgeniy Dodis and Sanjeev Khanna. Design networks with bounded pairwise distance. In STOC, pages 750-759, 1999. URL:
  19. R. Fagin. Degrees of acyclicity for hypergraphs and relational database schemes. JACM, 30(3):514-550, 1983. URL:
  20. Uriel Feige. A threshold of ln n for approximating set cover. JACM, 45(4):634-652, 1998. Google Scholar
  21. Moran Feldman, Guy Kortsarz, and Zeev Nutov. Improved approximation algorithms for directed steiner forest. JCSS, 78(1):279-292, 2012. URL:
  22. Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, and Alexandra Meliou. The complexity of resilience and responsibility for self-join-free conjunctive queries. PVLDB, 9(3):180-191, 2015. URL:
  23. Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, and Alexandra Meliou. New results for the complexity of resilience for binary conjunctive queries with self-joins. In PODS, pages 271-284, 2020. URL:
  24. Rajiv Gandhi, Samir Khuller, and Aravind Srinivasan. Approximation algorithms for partial covering problems. Journal of Algorithms, 53(1):55-84, 2004. URL:
  25. Andrew V Goldberg. Finding a maximum density subgraph. Technical report, University of California Berkeley, 1984. Google Scholar
  26. Todd J Green, Grigoris Karvounarakis, and Val Tannen. Provenance semirings. In PODS, pages 31-40, 2007. URL:
  27. Shuguang Hu, Xiaowei Wu, and TH Hubert Chan. Maintaining densest subsets efficiently in evolving hypergraphs. In CIKM, pages 929-938, 2017. URL:
  28. Xiao Hu and Stavros Sintos. Finding smallest witnesses for conjunctive queries. arXiv preprint, 2023. URL:
  29. Samir Khuller and Barna Saha. On finding dense subgraphs. In ICALP, pages 597-608. Springer, 2009. URL:
  30. Benny Kimelfeld, Jan Vondrák, and Ryan Williams. Maximizing conjunctive views in deletion propagation. In PODS, pages 187-198, 2011. URL:
  31. Benny Kimelfeld, Jan Vondrák, and Ryan Williams. Maximizing conjunctive views in deletion propagation. TODS, 37(4):1-37, 2012. URL:
  32. Zhengjie Miao, Sudeepa Roy, and Jun Yang. Explaining wrong queries using small examples. In SIGMOD, pages 503-520, 2019. URL:
  33. Dana Moshkovitz. The projection games conjecture and the np-hardness of ln n-approximating set-cover. In APPROX-RANDOM, pages 276-287. Springer, 2012. URL:
  34. Dan Olteanu and Maximilian Schleich. Factorized databases. ACM SIGMOD Record, 45(2):5-16, 2016. URL:
  35. John Paparrizos, Chunwei Liu, Bruno Barbarioli, Johnny Hwang, Ikraduya Edian, Aaron J Elmore, Michael J Franklin, and Sanjay Krishnan. Vergedb: A database for iot analytics on edge devices. In CIDR, 2021. URL:
  36. Jeff M Phillips. Coresets and sketches. In Handbook of discrete and computational geometry, pages 1269-1288. Chapman and Hall/CRC, 2017. Google Scholar
  37. Biao Qin, Deying Li, and Chunlai Zhou. The resilience of conjunctive queries with inequalities. Information Sciences, 613:982-1002, 2022. URL:
  38. Moshe Y Vardi. The complexity of relational query languages. In STOC, pages 137-146, 1982. URL:
  39. Vijay V Vazirani. Approximation algorithms, volume 1. Springer, 2001. Google Scholar
  40. Mihalis Yannakakis. Algorithms for acyclic database schemes. In VLDB, volume 81, pages 82-94, 1981. Google Scholar
  41. Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, and Ke Yi. Random sampling over joins revisited. In SIGMOD, pages 1525-1539, 2018. URL:
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail