Detecting Ambiguity in Prioritized Database Repairing

Authors Benny Kimelfeld, Ester Livshits, Liat Peterfreund



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2017.17.pdf
  • Filesize: 0.61 MB
  • 20 pages

Document Identifiers

Author Details

Benny Kimelfeld
Ester Livshits
Liat Peterfreund

Cite AsGet BibTex

Benny Kimelfeld, Ester Livshits, and Liat Peterfreund. Detecting Ambiguity in Prioritized Database Repairing. In 20th International Conference on Database Theory (ICDT 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 68, pp. 17:1-17:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.ICDT.2017.17

Abstract

In its traditional definition, a repair of an inconsistent database is a consistent database that differs from the inconsistent one in a "minimal way." Often, repairs are not equally legitimate, as it is desired to prefer one over another; for example, one fact is regarded more reliable than another, or a more recent fact should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs, in the context of denial constraints and subset repairs. There, a priority relation between facts is lifted towards a priority relation between consistent databases, and repairs are restricted to the ones that are optimal in the lifted sense. Three notions of lifting (and optimal repairs) have been proposed: Pareto, global, and completion. In this paper we investigate the complexity of deciding whether the priority relation suffices to clean the database unambiguously, or in other words, whether there is exactly one optimal repair. We show that the different lifting semantics entail highly different complexities. Under Pareto optimality, the problem is coNP-complete, in data complexity, for every set of functional dependencies (FDs), except for the tractable case of (equivalence to) one FD per relation. Under global optimality, one FD per relation is still tractable, but we establish Pi-2-p-completeness for a relation with two FDs. In contrast, under completion optimality the problem is solvable in polynomial time for every set of FDs. In fact, we present a polynomial-time algorithm for arbitrary conflict hypergraphs. We further show that under a general assumption of transitivity, this algorithm solves the problem even for global optimality. The algorithm is extremely simple, but its proof of correctness is quite intricate.
Keywords
  • inconsistent databases
  • preferred repairs
  • data cleaning
  • functional dependencies
  • conflict hypergraph

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Foto N. Afrati and Phokion G. Kolaitis. Repair checking in inconsistent databases: algorithms and complexity. In ICDT, pages 31-41. ACM, 2009. Google Scholar
  2. Douglas E. Appelt and Boyan Onyshkevych. The common pattern specification language. In TIPSTER Text Program: Phase III, pages 23-30. Association for Computational Linguistics, 1998. Google Scholar
  3. Marcelo Arenas, Leopoldo E. Bertossi, and Jan Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68-79. ACM, 1999. Google Scholar
  4. Leopoldo E. Bertossi. Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. Google Scholar
  5. Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746-755. IEEE, 2007. Google Scholar
  6. Philip Bohannon, Michael Flaster, Wenfei Fan, and Rajeev Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD, pages 143-154. ACM, 2005. Google Scholar
  7. Endre Boros, Khaled M. Elbassioni, Vladimir Gurvich, and Leonid Khachiyan. An efficient incremental algorithm for generating all maximal independent sets in hypergraphs of bounded dimension. Parallel Processing Letters, 10(4):253-266, 2000. Google Scholar
  8. Yang Cao, Wenfei Fan, and Wenyuan Yu. Determining the relative accuracy of attributes. In SIGMOD, pages 565-576. ACM, 2013. Google Scholar
  9. Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, and Shivakumar Vaithyanathan. SystemT: An algebraic approach to declarative information extraction. In ACL, pages 128-137, 2010. Google Scholar
  10. Jan Chomicki and Jerzy Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1-2):90-121, 2005. Google Scholar
  11. Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, and Nan Tang. NADEEF: a commodity data cleaning system. In SIGMOD, pages 541-552. ACM, 2013. Google Scholar
  12. Ronald Fagin, Benny Kimelfeld, and Phokion G. Kolaitis. Dichotomies in the complexity of preferred repairs. In PODS, pages 3-15. ACM, 2015. Google Scholar
  13. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Cleaning inconsistencies in information extraction via prioritized repairs. In PODS, pages 164-175. ACM, 2014. Google Scholar
  14. Ronald Fagin, Benny Kimelfeld, Frederick Reiss, and Stijn Vansummeren. Document spanners: A formal approach to information extraction. J. ACM, 62(2):12, 2015. Google Scholar
  15. Wenfei Fan and Floris Geerts. Foundations of Data Quality Management. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2012. Google Scholar
  16. Wenfei Fan, Floris Geerts, and Jef Wijsen. Determining the currency of data. ACM Trans. Database Syst., 37(4):25, 2012. Google Scholar
  17. Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Wenyuan Yu. Towards certain fixes with editing rules and master data. VLDB J., 21(2):213-238, 2012. Google Scholar
  18. Wenfei Fan, Shuai Ma, Nan Tang, and Wenyuan Yu. Interaction between record matching and data repairing. J. Data and Information Quality, 4(4):16:1-16:38, 2014. Google Scholar
  19. Terry Gaasterland, Parke Godfrey, and Jack Minker. An overview of cooperative answering. J. Intell. Inf. Syst., 1(2):123-157, 1992. Google Scholar
  20. Georg Gottlob and Enrico Malizia. Achieving new upper bounds for the hypergraph duality problem through logic. In CSL-LICS, pages 43:1-43:10. ACM, 2014. Google Scholar
  21. Sergio Greco, Cristina Sirangelo, Irina Trubitsyna, and Ester Zumpano. Feasibility conditions and preference criteria in querying and repairing inconsistent databases. In DEXA, volume 3180 of LNCS, pages 44-55. Springer, 2004. Google Scholar
  22. Sergio Greco, Cristina Sirangelo, Irina Trubitsyna, and Ester Zumpano. Preferred repairs for inconsistent databases. In Encyclopedia of Database Technologies and Applications, pages 480-485. Idea Group, 2005. Google Scholar
  23. Benny Kimelfeld, Ester Livshits, and Liat Peterfreund. Unambiguous prioritized repairing of databases. CoRR, abs/1603.01820, 2016. URL: http://arxiv.org/abs/1603.01820.
  24. Benny Kimelfeld, Jan Vondrák, and Ryan Williams. Maximizing conjunctive views in deletion propagation. ACM Trans. Database Syst., 37(4):24, 2012. Google Scholar
  25. Solmaz Kolahi and Laks V. S. Lakshmanan. On approximating optimum repairs for functional dependency violations. In ICDT, volume 361 of ACM International Conference Proceeding Series, pages 53-62. ACM, 2009. Google Scholar
  26. Paraschos Koutris and Dan Suciu. A dichotomy on the complexity of consistent query answering for atoms with simple keys. In ICDT, pages 165-176. OpenProceedings.org, 2014. Google Scholar
  27. Paraschos Koutris and Jef Wijsen. The data complexity of consistent query answering for self-join-free conjunctive queries under primary key constraints. In PODS, pages 17-29. ACM, 2015. Google Scholar
  28. Maria Vanina Martinez, Francesco Parisi, Andrea Pugliese, Gerardo I. Simari, and V. S. Subrahmanian. Inconsistency management policies. In KR, pages 367-377. AAAI Press, 2008. Google Scholar
  29. Dany Maslowski and Jef Wijsen. A dichotomy in the complexity of counting database repairs. J. Comput. Syst. Sci., 79(6):958-983, 2013. Google Scholar
  30. Dany Maslowski and Jef Wijsen. Counting database repairs that satisfy conjunctive queries with self-joins. In ICDT, pages 155-164. OpenProceedings.org, 2014. Google Scholar
  31. Amihai Motro, Philipp Anokhin, and Aybar C. Acar. Utility-based resolution of data inconsistencies. In IQIS, pages 35-43. ACM, 2004. Google Scholar
  32. Davy Van Nieuwenborgh and Dirk Vermeir. Preferred answer sets for ordered logic programs. TPLP, 6(1-2):107-167, 2006. Google Scholar
  33. Slawek Staworko, Jan Chomicki, and Jerzy Marcinkowski. Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell., 64(2-3):209-246, 2012. Google Scholar
  34. Slawomir Staworko, Jan Chomicki, and Jerzy Marcinkowski. Preference-driven querying of inconsistent relational databases. In EDBT Workshops, volume 4254 of LNCS, pages 318-335. Springer, 2006. Google Scholar
  35. Jef Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722-768, 2005. Google Scholar
  36. Jef Wijsen. Charting the tractability frontier of certain conjunctive query answering. In PODS, pages 189-200. ACM, 2013. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail