The Consistency of Probabilistic Databases with Independent Cells

Authors Amir Gilad, Aviram Imber, Benny Kimelfeld



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2023.22.pdf
  • Filesize: 0.78 MB
  • 19 pages

Document Identifiers

Author Details

Amir Gilad
  • Duke University, Durham, NC, USA
Aviram Imber
  • Technion - Israel Institute of Technology, Haifa, Israel
Benny Kimelfeld
  • Technion - Israel Institute of Technology, Haifa, Israel

Cite AsGet BibTex

Amir Gilad, Aviram Imber, and Benny Kimelfeld. The Consistency of Probabilistic Databases with Independent Cells. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 22:1-22:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.ICDT.2023.22

Abstract

A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the context of noisy operations such as missing data imputation, where we automatically fill in missing values, column prediction, where we predict unknown attributes, and database cleaning (and repairing), where we replace the original values due to detected errors or violation of integrity constraints. We study the computational complexity of problems that regard the selection of cell values in the presence of integrity constraints. More precisely, we focus on functional dependencies and study three problems: (1) deciding whether the constraints can be satisfied by any choice of values, (2) finding a most probable such choice, and (3) calculating the probability of satisfying the constraints. The data complexity of these problems is determined by the combination of the set of functional dependencies and the collection of uncertain attributes. We give full classifications into tractable and intractable complexities for several classes of constraints, including a single dependency, matching constraints, and unary functional dependencies.

Subject Classification

ACM Subject Classification
  • Information systems → Data management systems
Keywords
  • Probabilistic databases
  • attribute-level uncertainty
  • functional dependencies
  • most probable database

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Serge Abiteboul, T.-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Aggregate queries for discrete and continuous probabilistic XML. In ICDT, ACM International Conference Proceeding Series, pages 50-61. ACM, 2010. Google Scholar
  2. Periklis Andritsos, Ariel Fuxman, and Renée J. Miller. Clean answers over dirty databases: A probabilistic approach. In ICDE, page 30, 2006. Google Scholar
  3. Felix Bießmann, Tammo Rukat, Philipp Schmidt, Prathik Naidu, Sebastian Schelter, Andrey Taptunov, Dustin Lange, and David Salinas. Datawig: Missing value imputation for tables. J. Mach. Learn. Res., 20:175:1-175:6, 2019. Google Scholar
  4. Marco Calautti, Ester Livshits, Andreas Pieris, and Markus Schneider. Counting database repairs entailing a query: The case of functional dependencies. In PODS, pages 403-412. ACM, 2022. Google Scholar
  5. Nofar Carmeli, Martin Grohe, Benny Kimelfeld, Ester Livshits, and Muhammad Tibi. Database repairing with soft functional dependencies. In ICDT, volume 186 of LIPIcs, pages 16:1-16:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  6. Reynold Cheng, Jinchuan Chen, and Xike Xie. Cleaning uncertain data with quality guarantees. Proc. VLDB Endow., 1(1):722-735, 2008. Google Scholar
  7. Sara Cohen, Benny Kimelfeld, and Yehoshua Sagiv. Incorporating constraints in probabilistic XML. ACM Trans. Database Syst., 34(3):18:1-18:45, 2009. Google Scholar
  8. Nilesh N. Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pages 864-875. Morgan Kaufmann, 2004. Google Scholar
  9. Wenfei Fan and Floris Geerts. Foundations of Data Quality Management. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2012. Google Scholar
  10. Amir Gilad, Aviram Imber, and Benny Kimelfeld. The consistency of probabilistic databases with independent cells, 2022. URL: https://doi.org/10.48550/arXiv.2212.12104.
  11. Erich Grädel, Yuri Gurevich, and Colin Hirsch. The complexity of query reliability. In PODS, pages 227-234. ACM Press, 1998. Google Scholar
  12. Eric Gribkoff, Guy Van den Broeck, and Dan Suciu. The most probable database problem. In BUDA, 2014. URL: http://www.sigmod2014.org/buda.
  13. Venkatesan Guruswami. Inapproximability results for set splitting and satisfiability problems with no mixed clauses. In APPROX, volume 1913 of LNCS, pages 155-166. Springer, 2000. Google Scholar
  14. Miika Hannula and Jef Wijsen. A dichotomy in consistent query answering for primary keys and unary foreign keys. In PODS, pages 437-449. ACM, 2022. Google Scholar
  15. Alireza Heidari, Joshua McGrath, Ihab F. Ilyas, and Theodoros Rekatsinas. HoloDetect: Few-shot learning for error detection. In SIGMOD Conference, pages 829-846. ACM, 2019. Google Scholar
  16. Abhay Kumar Jha, Vibhor Rastogi, and Dan Suciu. Query evaluation with soft-key constraints. In PODS, pages 119-128. ACM, 2008. Google Scholar
  17. Abhay Kumar Jha and Dan Suciu. Probabilistic databases with markoviews. Proc. VLDB Endow., 5(11):1160-1171, 2012. Google Scholar
  18. Solmaz Kolahi and Laks V. S. Lakshmanan. On approximating optimum repairs for functional dependency violations. In ICDT, volume 361 of ACM, pages 53-62. ACM, 2009. Google Scholar
  19. Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83-97, 1955. Google Scholar
  20. Ester Livshits, Alireza Heidari, Ihab F. Ilyas, and Benny Kimelfeld. Approximate denial constraints. Proc. VLDB Endow., 13(10):1682-1695, 2020. Google Scholar
  21. Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):4:1-4:46, 2020. URL: https://doi.org/10.1145/3360904.
  22. Ester Livshits, Benny Kimelfeld, and Jef Wijsen. Counting subset repairs with functional dependencies. J. Comput. Syst. Sci., 117:154-164, 2021. Google Scholar
  23. Dany Maslowski and Jef Wijsen. A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci., 79(6):958-983, 2013. Google Scholar
  24. Chris Mayfield, Jennifer Neville, and Sunil Prabhakar. ERACER: a database approach for statistical inference and data cleaning. In SIGMOD Conference, pages 75-86. ACM, 2010. Google Scholar
  25. Luyi Mo, Reynold Cheng, Xiang Li, David W. Cheung, and Xuan S. Yang. Cleaning uncertain data for top-k queries. In ICDE, pages 134-145, 2013. Google Scholar
  26. Christopher Ré and Dan Suciu. Materialized views in probabilistic databases for information exchange and query optimization. In Proc. VLDB Endow., pages 51-62, 2007. Google Scholar
  27. Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. HoloClean: Holistic data repairs with probabilistic inference. PVLDB, 10(11):1190-1201, 2017. Google Scholar
  28. Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. A formal framework for probabilistic unclean databases. In ICDT, volume 127 of LIPIcs, pages 6:1-6:18, 2019. Google Scholar
  29. Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. Google Scholar
  30. Seinosuke Toda. PP is as hard as the polynomial-time hierarchy. SIAM J. Comput., 20(5):865-877, 1991. Google Scholar
  31. Frank S.C. Tseng, Wei-Pang Yang, and Arbee L.P. Chen. Finding a complete matching with the maximum product on weighted bipartite graphs. Computers & Mathematics with Applications, 25(5):65-71, 1993. Google Scholar
  32. Leslie G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189-201, 1979. Google Scholar
  33. Moshe Y. Vardi. The complexity of relational query languages (extended abstract). In STOC, pages 137-146. ACM, 1982. Google Scholar
  34. Jef Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722-768, 2005. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail