Detecting Ambiguity in Prioritized Database Repairing

Kimelfeld, Benny; Livshits, Ester; Peterfreund, Liat

doi:10.4230/LIPIcs.ICDT.2017.17

File

Author Details

Benny Kimelfeld

Ester Livshits

Liat Peterfreund

Cite As Get BibTex

Benny Kimelfeld, Ester Livshits, and Liat Peterfreund. Detecting Ambiguity in Prioritized Database Repairing. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 17:1-17:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017) https://doi.org/10.4230/LIPIcs.ICDT.2017.17

Abstract

In its traditional definition, a repair of an inconsistent database is a consistent database that differs from the inconsistent one in a "minimal way." Often, repairs are not equally legitimate, as it is desired to prefer one over another; for example, one fact is regarded more reliable than another, or a more recent fact should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs, in the context of denial constraints and subset repairs. There, a priority relation between facts is lifted towards a priority relation between consistent databases, and repairs are restricted to the ones that are optimal in the lifted sense. Three notions of lifting (and optimal repairs) have been proposed: Pareto, global, and completion. In this paper we investigate the complexity of deciding whether the priority relation suffices to clean the database unambiguously, or in other words, whether there is exactly one optimal repair. We show that the different lifting semantics entail highly different complexities. Under Pareto optimality, the problem is coNP-complete, in data complexity, for every set of functional dependencies (FDs), except for the tractable case of (equivalence to) one FD per relation. Under global optimality, one FD per relation is still tractable, but we establish Pi-2-p-completeness for a relation with two FDs. In contrast, under completion optimality the problem is solvable in polynomial time for every set of FDs. In fact, we present a polynomial-time algorithm for arbitrary conflict hypergraphs. We further show that under a general assumption of transitivity, this algorithm solves the problem even for global optimality. The algorithm is extremely simple, but its proof of correctness is quite intricate.

Subject Classification

Keywords

inconsistent databases
preferred repairs
data cleaning
functional dependencies
conflict hypergraph

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Foto N. Afrati and Phokion G. Kolaitis. Repair checking in inconsistent databases: algorithms and complexity. In ICDT, pages 31-41. ACM, 2009.
Douglas E. Appelt and Boyan Onyshkevych. The common pattern specification language. In TIPSTER Text Program: Phase III, pages 23-30. Association for Computational Linguistics, 1998.
Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68-79. ACM, 1999.
Leopoldo E. Bertossi. Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.
Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746-755. IEEE, 2007.
Philip Bohannon, Michael Flaster, Wenfei Fan, and Rajeev Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, pages 143-154. ACM, 2005.
Endre Boros, Khaled M. Elbassioni, Vladimir Gurvich, and Leonid Khachiyan. An efficient incremental algorithm for generating all maximal independent sets in hypergraphs of bounded dimension. Parallel Processing Letters, 10(4):253-266, 2000.
Yang Cao, Wenfei Fan, and Wenyuan Yu. Determining the relative accuracy of attributes. In SIGMOD, pages 565-576. ACM, 2013.
Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, and Shivakumar Vaithyanathan. SystemT: An algebraic approach to declarative information extraction. In ACL, pages 128-137, 2010.
Jan Chomicki and Jerzy Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1-2):90-121, 2005.
Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, and Nan Tang. NADEEF: a commodity data cleaning system. In SIGMOD, pages 541-552. ACM, 2013.
Ronald Fagin, Benny Kimelfeld, and Phokion G. Kolaitis. Dichotomies in the complexity of preferred repairs. In PODS, pages 3-15. ACM, 2015.
Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Cleaning inconsistencies in information extraction via prioritized repairs. In PODS, pages 164-175. ACM, 2014.
Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document spanners: A formal approach to information extraction. J. ACM, 62(2):12, 2015.
Wenfei Fan and Floris Geerts. Foundations of Data Quality Management. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2012.
Wenfei Fan, Floris Geerts, and Jef Wijsen. Determining the currency of data. ACM Trans. Database Syst., 37(4):25, 2012.
Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Wenyuan Yu. Towards certain fixes with editing rules and master data. VLDB J., 21(2):213-238, 2012.
Wenfei Fan, Shuai Ma, Nan Tang, and Wenyuan Yu. Interaction between record matching and data repairing. J. Data and Information Quality, 4(4):16:1-16:38, 2014.
Terry Gaasterland, Parke Godfrey, and Jack Minker. An overview of cooperative answering. J. Intell. Inf. Syst., 1(2):123-157, 1992.
Georg Gottlob and Enrico Malizia. Achieving new upper bounds for the hypergraph duality problem through logic. In CSL-LICS, pages 43:1-43:10. ACM, 2014.
Sergio Greco, Cristina Sirangelo, Irina Trubitsyna, and Ester Zumpano. Feasibility conditions and preference criteria in querying and repairing inconsistent databases. In DEXA, volume 3180 of LNCS, pages 44-55. Springer, 2004.
Sergio Greco, Cristina Sirangelo, Irina Trubitsyna, and Ester Zumpano. Preferred repairs for inconsistent databases. In Encyclopedia of Database Technologies and Applications, pages 480-485. Idea Group, 2005.
Benny Kimelfeld, Ester Livshits, and Liat Peterfreund. Unambiguous prioritized repairing of databases. CoRR, abs/1603.01820, 2016. URL: http://arxiv.org/abs/1603.01820.
Benny Kimelfeld, Jan Vondrák, and Ryan Williams. Maximizing conjunctive views in deletion propagation. ACM Trans. Database Syst., 37(4):24, 2012.
Solmaz Kolahi and Laks V. S. Lakshmanan. On approximating optimum repairs for functional dependency violations. In ICDT, volume 361 of ACM International Conference Proceeding Series, pages 53-62. ACM, 2009.
Paraschos Koutris and Dan Suciu. A dichotomy on the complexity of consistent query answering for atoms with simple keys. In ICDT, pages 165-176. OpenProceedings.org, 2014.
Paraschos Koutris and Jef Wijsen. The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In PODS, pages 17-29. ACM, 2015.
Maria Vanina Martinez, Francesco Parisi, Andrea Pugliese, Gerardo I. Simari, and V. S. Subrahmanian. Inconsistency management policies. In KR, pages 367-377. AAAI Press, 2008.
Dany Maslowski and Jef Wijsen. A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci., 79(6):958-983, 2013.
Dany Maslowski and Jef Wijsen. Counting database repairs that satisfy conjunctive queries with self-joins. In ICDT, pages 155-164. OpenProceedings.org, 2014.
Amihai Motro, Philipp Anokhin, and Aybar C. Acar. Utility-based resolution of data inconsistencies. In IQIS, pages 35-43. ACM, 2004.
Davy Van Nieuwenborgh and Dirk Vermeir. Preferred answer sets for ordered logic programs. TPLP, 6(1-2):107-167, 2006.
Slawek Staworko, Jan Chomicki, and Jerzy Marcinkowski. Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell., 64(2-3):209-246, 2012.
Slawomir Staworko, Jan Chomicki, and Jerzy Marcinkowski. Preference-driven querying of inconsistent relational databases. In EDBT Workshops, volume 4254 of LNCS, pages 318-335. Springer, 2006.
Jef Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722-768, 2005.
Jef Wijsen. Charting the tractability frontier of certain conjunctive query answering. In PODS, pages 189-200. ACM, 2013.

Detecting Ambiguity in Prioritized Database Repairing

Authors Benny Kimelfeld, Ester Livshits, Liat Peterfreund

File

Document Identifiers

Author Details

Cite As Get BibTex

Abstract

Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message