Database Repairing with Soft Functional Dependencies

Authors Nofar Carmeli, Martin Grohe, Benny Kimelfeld, Ester Livshits, Muhammad Tibi



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2021.16.pdf
  • Filesize: 0.77 MB
  • 17 pages

Document Identifiers

Author Details

Nofar Carmeli
  • Technion - Israel Institute of Technology, Haifa, Israel
Martin Grohe
  • RWTH Aachen University, Germany
Benny Kimelfeld
  • Technion - Israel Institute of Technology, Haifa, Israel
Ester Livshits
  • Technion - Israel Institute of Technology, Haifa, Israel
Muhammad Tibi
  • Technion - Israel Institute of Technology, Haifa, Israel

Acknowledgements

The authors are very grateful to Alessio Conte and Peter Lindner for insightful discussions about the work described in this paper.

Cite As Get BibTex

Nofar Carmeli, Martin Grohe, Benny Kimelfeld, Ester Livshits, and Muhammad Tibi. Database Repairing with Soft Functional Dependencies. In 24th International Conference on Database Theory (ICDT 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 186, pp. 16:1-16:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.ICDT.2021.16

Abstract

A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost), this subset is a "cardinality repair" of an inconsistent database; in soft interpretations, this subset corresponds to a "most probable world" of a probabilistic database, a "most likely intention" of a probabilistic unclean database, and so on. Within the class of functional dependencies, the complexity of finding a cardinality repair is thoroughly understood. Yet, very little is known about the complexity of finding an optimal subset for the more general soft semantics. This paper makes a significant progress in this direction. In addition to general insights about the hardness and approximability of the problem, we present algorithms for two special cases: a single functional dependency, and a bipartite matching. The latter is the problem of finding an optimal "almost matching" of a bipartite graph where a penalty is paid for every lost edge and every violation of monogamy.

Subject Classification

ACM Subject Classification
  • Theory of computation → Incomplete, inconsistent, and uncertain databases
  • Information systems → Data cleaning
Keywords
  • Database inconsistency
  • database repairs
  • integrity constraints
  • soft constraints
  • functional dependencies

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network flows - theory, algorithms and applications. Prentice Hall, 1993. Google Scholar
  2. Xu Chu, Ihab F. Ilyas, and Paolo Papotti. Discovering denial constraints. PVLDB, 6(13):1498-1509, 2013. URL: http://www.vldb.org/pvldb/vol6/p1498-papotti.pdf, URL: https://doi.org/10.14778/2536258.2536262.
  3. Carlo Combi, Matteo Mantovani, Alberto Sabaini, Pietro Sala, Francesco Amaddeo, Ugo Moretti, and Giuseppe Pozzi. Mining approximate temporal functional dependencies with pure temporal grouping in clinical databases. Comp. in Bio. and Med., 62:306-324, 2015. URL: https://doi.org/10.1016/j.compbiomed.2014.08.004.
  4. Andrew V. Goldberg and Robert E. Tarjan. Finding minimum-cost circulations by successive approximation. Math. Oper. Res., 15(3):430–466, August 1990. URL: https://doi.org/10.1287/moor.15.3.430.
  5. Teofilo F. Gonzalez, editor. Handbook of Approximation Algorithms and Metaheuristics. Chapman and Hall/CRC, 2007. URL: https://doi.org/10.1201/9781420010749.
  6. Eric Gribkoff, Guy Van den Broeck, and Dan Suciu. The most probable database problem. In BUDA, 2014. URL: http://www.sigmod2014.org/buda.
  7. Dorit S Hochbaum. Approximation algorithms for the set covering and vertex cover problems. SIAM Journal on computing, 11(3):555-556, 1982. Google Scholar
  8. Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J., 42(2):100-111, 1999. URL: https://doi.org/10.1093/comjnl/42.2.100.
  9. Abhay Kumar Jha, Vibhor Rastogi, and Dan Suciu. Query evaluation with soft-key constraints. In PODS, pages 119-128, 2008. Google Scholar
  10. Solmaz Kolahi and Laks V. S. Lakshmanan. On approximating optimum repairs for functional dependency violations. In ICDT, volume 361 of ACM International Conference Proceeding Series, pages 53-62. ACM, 2009. Google Scholar
  11. Weibang Li, Zhanhuai Li, Qun Chen, Tao Jiang, and Zhilei Yin. Discovering approximate functional dependencies from distributed big data. In APWeb, pages 289-301, 2016. URL: https://doi.org/10.1007/978-3-319-45817-5_23.
  12. Ester Livshits, Alireza Heidari, Ihab F. Ilyas, and Benny Kimelfeld. Approximate denial constraints. Proc. VLDB Endow., 13(10):1682-1695, 2020. URL: http://www.vldb.org/pvldb/vol13/p1682-livshits.pdf.
  13. Ester Livshits, Ihab F. Ilyas, Benny Kimelfeld, and Sudeepa Roy. Principles of progress indicators for database repairing. CoRR, abs/1904.06492, 2019. Google Scholar
  14. Ester Livshits, Benny Kimelfeld, and Sudeepa Roy. Computing optimal repairs for functional dependencies. ACM Trans. Database Syst., 45(1):4:1-4:46, 2020. URL: https://doi.org/10.1145/3360904.
  15. Andrei Lopatenko and Leopoldo E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, volume 4353 of Lecture Notes in Computer Science, pages 179-193. Springer, 2007. Google Scholar
  16. Eduardo H. M. Pena, Eduardo Cunha de Almeida, and Felix Naumann. Discovery of approximate (and exact) denial constraints. Proc. VLDB Endow., 13(3):266-278, 2019. URL: https://doi.org/10.14778/3368289.3368293.
  17. Matthew Richardson and Pedro Domingos. Markov logic networks. Mach. Learn., 62(1-2):107-136, February 2006. Google Scholar
  18. Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. A formal framework for probabilistic unclean databases. In ICDT, volume 127 of LIPIcs, pages 6:1-6:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019. Google Scholar
  19. Prithviraj Sen, Amol Deshpande, and Lise Getoor. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J., 18(5):1065-1090, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail