Probabilistic Query Evaluation with Bag Semantics

Authors Martin Grohe , Peter Lindner , Christoph Standke



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2023.20.pdf
  • Filesize: 0.89 MB
  • 19 pages

Document Identifiers

Author Details

Martin Grohe
  • RWTH Aachen University, Germany
Peter Lindner
  • École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Christoph Standke
  • RWTH Aachen University, Germany

Cite As Get BibTex

Martin Grohe, Peter Lindner, and Christoph Standke. Probabilistic Query Evaluation with Bag Semantics. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 20:1-20:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.ICDT.2023.20

Abstract

We initiate the study of probabilistic query evaluation under bag semantics where tuples are allowed to be present with duplicates. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi & Suciu, JACM 2012).
Due to potentially unbounded multiplicities, the bag probabilistic databases we discuss are no longer finite objects, which requires a treatment of representation mechanisms. Moreover, the answer to a Boolean query is a probability distribution over non-negative integers, rather than a probability distribution over {true, false}. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries.

Subject Classification

ACM Subject Classification
  • Theory of computation → Incomplete, inconsistent, and uncertain databases
Keywords
  • Probabilistic Query Evaluation
  • Probabilistic Databases
  • Bag Semantics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. Tractable Lineages on Treelike Instances: Limits and Extensions. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2016), pages 355-370. Association for Computing Machinery, 2016. URL: https://doi.org/10.1145/2902251.2902301.
  2. Antoine Amarilli and Benny Kimelfeld. Uniform Reliability of Self-Join-Free Conjunctive Queries, 2021. Extended version of [Amarilli and Kimelfeld, 2021]. URL: http://arxiv.org/abs/1908.07093v6.
  3. Antoine Amarilli and Benny Kimelfeld. Uniform Reliability of Self-Join-Free Conjunctive Queries. In 24th International Conference on Database Theory (ICDT 2021), volume 186 of Leibniz International Proceedings in Informatics (LIPIcs), pages 17:1-17:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ICDT.2021.17.
  4. Antoine Amarilli and İsmail İlkan Ceylan. A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs. In 23rd International Conference on Database Theory (ICDT 2020), volume 155 of Leibniz International Proceedings in Informatics (LIPIcs), pages 5:1-5:20, Dagstuhl, Germany, 2020. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.ICDT.2020.5.
  5. Mark Braverman and Stephen Cook. Computing over the Reals: Foundations for Scientific Computing. Notices of the AMS, 53(3):318-329, 2006. Google Scholar
  6. George Casella and Roger L. Berger. Statistical Inference. Thomson Learning, Pacific Grove, CA, USA, 2nd edition, 2002. Google Scholar
  7. Surajit Chaudhuri and Moshe Y. Vardi. Optimization of Real Conjunctive Queries. In Proceedings of the Twelfth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1993), pages 59-70. ACM Press, 1993. URL: https://doi.org/10.1145/153850.153856.
  8. Sara Cohen. Equivalence of Queries That Are Sensitive to Multiplicities. The VLDB Journal, 18(3):765-785, 2009. URL: https://doi.org/10.1007/s00778-008-0122-1.
  9. Nilesh Dalvi and Dan Suciu. Efficient Query Evaluation on Probabilistic Databases. In Proceedings of the Thirtieth International Conference on Very Large Data Bases (VLDB 2004), volume 30, pages 864-875. VLDB Endowment, 2004. Google Scholar
  10. Nilesh Dalvi and Dan Suciu. Efficient Query Evaluation on Probabilistic Databases. The VLDB Journal, 16(4):523-544, 2007. URL: https://doi.org/10.1007/s00778-006-0004-3.
  11. Nilesh Dalvi and Dan Suciu. The Dichotomy of Conjunctive Queries on Probabilistic Structures. In Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2007), pages 293-302, New York, NY, USA, 2007. Association for Computing Machinery. URL: https://doi.org/10.1145/1265530.1265571.
  12. Nilesh Dalvi and Dan Suciu. The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries. Journal of the ACM, 59(6):1-87, 2012. URL: https://doi.org/10.1145/2395116.2395119.
  13. Su Feng, Boris Glavic, Aaron Huber, Oliver Kennedy, and Atri Rudra. Computing Expected Multiplicities for Bag-TIDBs with Bounded Multiplicities, 2022. URL: http://arxiv.org/abs/2204.02758v3.
  14. Robert Fink and Dan Olteanu. Dichotomies for Queries with Negation in Probabilistic Databases. ACM Transactions on Database Systems, 41(1):4:1-4:47, 2016. URL: https://doi.org/10.1145/2877203.
  15. Erich Grädel, Yuri Gurevich, and Colin Hirsch. The Complexity of Query Reliability. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1998), pages 227-234. ACM Press, 1998. URL: https://doi.org/10.1145/275487.295124.
  16. Todd J. Green, Grigoris Karvounarakis, and Val Tannen. Provenance Semirings. In Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2007), pages 31-40. Association for Computing Machinery, 2007. URL: https://doi.org/10.1145/1265530.1265535.
  17. Todd J. Green and Val Tannen. Models for Incomplete and Probabilistic Information. In Current Trends in Database Technology - EDBT 2006, pages 278-296, Berlin, Germany, 2006. Springer-Verlag Berlin Heidelbeg. URL: https://doi.org/10.1007/11896548_24.
  18. Martin Grohe and Peter Lindner. Independence in Infinite Probabilistic Databases. J. ACM, 69(5):37:1-37:42, 2022. URL: https://doi.org/10.1145/3549525.
  19. Martin Grohe and Peter Lindner. Infinite Probabilistic Databases. Logical Methods in Computer Science, Volume 18, Issue 1, 2022. URL: https://doi.org/10.46298/lmcs-18(1:34)2022.
  20. Martin Grohe, Peter Lindner, and Christoph Standke. Probabilistic Query Evaluation with Bag Semantics, 2022. URL: http://arxiv.org/abs/2201.11524.
  21. Erich Grädel and Val Tannen. Semiring Provenance for First-Order Model Checking, 2017. URL: http://arxiv.org/abs/1712.01980v1.
  22. Francis Begnaud Hildebrand. Introduction to Numerical Analysis. Courier Corporation, 1987. Google Scholar
  23. Batya Kenig and Dan Suciu. A Dichotomy for the Generalized Model Counting Problem for Unions of Conjunctive Queries. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2021), pages 312-324, New York, NY, USA, 2021. Association for Computing Machinery. URL: https://doi.org/10.1145/3452021.3458313.
  24. Achim Klenke. Probability Theory: A Comprehensive Course. Universitext. Springer-Verlag London, London, UK, 2nd edition, 2014. Translation from the German language edition. URL: https://doi.org/10.1007/978-1-4471-5361-0.
  25. Ker-I Ko. Complexity Theory of Real Functions. Progress in Theoretical Computer Science. Birkhäuser Boston, 1991. URL: https://doi.org/10.1007/978-1-4684-6802-1.
  26. Dan Olteanu and Jiewen Huang. Using OBDDs for Efficient Query Evaluation on Probabilistic Databases. In SUM 2008: Scalable Uncertainty Management, Lecture Notes in Computer Science, pages 326-340, Berlin, Heidelberg, 2008. Springer. URL: https://doi.org/10.1007/978-3-540-87993-0_26.
  27. Dan Olteanu and Jiewen Huang. Secondary-Storage Confidence Computation for Conjunctive Queries with Inequalities. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 389-402. ACM, 2009. URL: https://doi.org/10.1145/1559845.1559887.
  28. Christopher Ré and Dan Suciu. The Trichotomy Of HAVING Queries On A Probabilistic Database. The VLDB Journal, 18(5):1091-1116, July 2009. URL: https://doi.org/10.1007/s00778-009-0151-4.
  29. Dan Suciu. Probabilistic Databases for All. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 19-31. ACM, 2020. URL: https://doi.org/10.1145/3375395.3389129.
  30. Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases, volume 3.2 of Synthesis Lectures on Data Management. Morgan & Claypool, San Rafael, CA, USA, 2011. Lecture #16. URL: https://doi.org/10.2200/S00362ED1V01Y201105DTM016.
  31. Guy Van den Broeck and Dan Suciu. Query Processing on Probabilistic Data: A Survey. Foundations and Trendsregistered in Databases, 7(3-4):197-341, 2017. URL: https://doi.org/10.1561/1900000052.
  32. Moshe Y. Vardi. The Complexity of Relational Query Languages. In Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing (STOC 1982), pages 137-146, New York, NY, USA, 1982. ACM Press. URL: https://doi.org/10.1145/800070.802186.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail