The Consistency of Probabilistic Databases with Independent Cells

Gilad, Amir; Imber, Aviram; Kimelfeld, Benny

doi:10.4230/LIPIcs.ICDT.2023.22

File

Subject Classification

ACM Subject Classification

Information systems → Data management systems

Keywords

Probabilistic databases
attribute-level uncertainty
functional dependencies
most probable database

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

Document

0

Metadata

Abstract

A probabilistic database with attribute-level uncertainty consists of relations where cells of some attributes may hold probability distributions rather than deterministic content. Such databases arise, implicitly or explicitly, in the context of noisy operations such as missing data imputation, where we automatically fill in missing values, column prediction, where we predict unknown attributes, and database cleaning (and repairing), where we replace the original values due to detected errors or violation of integrity constraints. We study the computational complexity of problems that regard the selection of cell values in the presence of integrity constraints. More precisely, we focus on functional dependencies and study three problems: (1) deciding whether the constraints can be satisfied by any choice of values, (2) finding a most probable such choice, and (3) calculating the probability of satisfying the constraints. The data complexity of these problems is determined by the combination of the set of functional dependencies and the collection of uncertain attributes. We give full classifications into tractable and intractable complexities for several classes of constraints, including a single dependency, matching constraints, and unary functional dependencies.

Cite As Get BibTex

Amir Gilad, Aviram Imber, and Benny Kimelfeld. The Consistency of Probabilistic Databases with Independent Cells. In 26th International Conference on Database Theory (ICDT 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 255, pp. 22:1-22:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.ICDT.2023.22

Author Details

Amir Gilad

Duke University, Durham, NC, USA

Aviram Imber

Technion - Israel Institute of Technology, Haifa, Israel

Benny Kimelfeld

Technion - Israel Institute of Technology, Haifa, Israel

References

Serge Abiteboul, T.-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. Aggregate queries for discrete and continuous probabilistic XML. In ICDT, ACM International Conference Proceeding Series, pages 50-61. ACM, 2010.
Periklis Andritsos, Ariel Fuxman, and Renée J. Miller. Clean answers over dirty databases: A probabilistic approach. In ICDE, page 30, 2006.
Felix Bießmann, Tammo Rukat, Philipp Schmidt, Prathik Naidu, Sebastian Schelter, Andrey Taptunov, Dustin Lange, and David Salinas. Datawig: Missing value imputation for tables. J. Mach. Learn. Res., 20:175:1-175:6, 2019.
Marco Calautti, Ester Livshits, Andreas Pieris, and Markus Schneider. Counting database repairs entailing a query: The case of functional dependencies. In PODS, pages 403-412. ACM, 2022.
Nofar Carmeli, Martin Grohe, Benny Kimelfeld, Ester Livshits, and Muhammad Tibi. Database repairing with soft functional dependencies. In ICDT, volume 186 of LIPIcs, pages 16:1-16:17. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
Reynold Cheng, Jinchuan Chen, and Xike Xie. Cleaning uncertain data with quality guarantees. Proc. VLDB Endow., 1(1):722-735, 2008.
Sara Cohen, Benny Kimelfeld, and Yehoshua Sagiv. Incorporating constraints in probabilistic XML. ACM Trans. Database Syst., 34(3):18:1-18:45, 2009.
Nilesh N. Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pages 864-875. Morgan Kaufmann, 2004.
Wenfei Fan and Floris Geerts. Foundations of Data Quality Management. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2012.
Amir Gilad, Aviram Imber, and Benny Kimelfeld. The consistency of probabilistic databases with independent cells, 2022. URL: https://doi.org/10.48550/arXiv.2212.12104.
Erich Grädel, Yuri Gurevich, and Colin Hirsch. The complexity of query reliability. In PODS, pages 227-234. ACM Press, 1998.
Eric Gribkoff, Guy Van den Broeck, and Dan Suciu. The most probable database problem. In BUDA, 2014. URL: http://www.sigmod2014.org/buda.
Venkatesan Guruswami. Inapproximability results for set splitting and satisfiability problems with no mixed clauses. In APPROX, volume 1913 of LNCS, pages 155-166. Springer, 2000.
Miika Hannula and Jef Wijsen. A dichotomy in consistent query answering for primary keys and unary foreign keys. In PODS, pages 437-449. ACM, 2022.
Alireza Heidari, Joshua McGrath, Ihab F. Ilyas, and Theodoros Rekatsinas. HoloDetect: Few-shot learning for error detection. In SIGMOD Conference, pages 829-846. ACM, 2019.
Abhay Kumar Jha, Vibhor Rastogi, and Dan Suciu. Query evaluation with soft-key constraints. In PODS, pages 119-128. ACM, 2008.
Abhay Kumar Jha and Dan Suciu. Probabilistic databases with markoviews. Proc. VLDB Endow., 5(11):1160-1171, 2012.
Solmaz Kolahi and Laks V. S. Lakshmanan. On approximating optimum repairs for functional dependency violations. In ICDT, volume 361 of ACM, pages 53-62. ACM, 2009.
Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83-97, 1955.
Ester Livshits, Alireza Heidari, Ihab F. Ilyas, and Benny Kimelfeld. Approximate denial constraints. Proc. VLDB Endow., 13(10):1682-1695, 2020.
Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):4:1-4:46, 2020. URL: https://doi.org/10.1145/3360904.
Ester Livshits, Benny Kimelfeld, and Jef Wijsen. Counting subset repairs with functional dependencies. J. Comput. Syst. Sci., 117:154-164, 2021.
Dany Maslowski and Jef Wijsen. A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci., 79(6):958-983, 2013.
Chris Mayfield, Jennifer Neville, and Sunil Prabhakar. ERACER: a database approach for statistical inference and data cleaning. In SIGMOD Conference, pages 75-86. ACM, 2010.
Luyi Mo, Reynold Cheng, Xiang Li, David W. Cheung, and Xuan S. Yang. Cleaning uncertain data for top-k queries. In ICDE, pages 134-145, 2013.
Christopher Ré and Dan Suciu. Materialized views in probabilistic databases for information exchange and query optimization. In Proc. VLDB Endow., pages 51-62, 2007.
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. HoloClean: Holistic data repairs with probabilistic inference. PVLDB, 10(11):1190-1201, 2017.
Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. A formal framework for probabilistic unclean databases. In ICDT, volume 127 of LIPIcs, pages 6:1-6:18, 2019.
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.
Seinosuke Toda. PP is as hard as the polynomial-time hierarchy. SIAM J. Comput., 20(5):865-877, 1991.
Frank S.C. Tseng, Wei-Pang Yang, and Arbee L.P. Chen. Finding a complete matching with the maximum product on weighted bipartite graphs. Computers & Mathematics with Applications, 25(5):65-71, 1993.
Leslie G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189-201, 1979.
Moshe Y. Vardi. The complexity of relational query languages (extended abstract). In STOC, pages 137-146. ACM, 1982.
Jef Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722-768, 2005.

The Consistency of Probabilistic Databases with Independent Cells

Authors Amir Gilad, Aviram Imber, Benny Kimelfeld

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

The Consistency of Probabilistic Databases with Independent Cells

Authors Amir Gilad, Aviram Imber, Benny Kimelfeld

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message