Targeted Least Cardinality Candidate Key for Relational Databases

Authors Vasileios Nakos , Hung Q. Ngo , Charalampos E. Tsourakakis



PDF
Thumbnail PDF

File

LIPIcs.ICDT.2025.21.pdf
  • Filesize: 0.87 MB
  • 18 pages

Document Identifiers

Author Details

Vasileios Nakos
  • National and Kapodistrian University of Athens, Greece
  • Archimedes Athena RC, Marousi, Greece
Hung Q. Ngo
  • RelationalAI, Berkeley, CA, USA
Charalampos E. Tsourakakis
  • RelationalAI, Berkeley, CA, USA

Cite As Get BibTex

Vasileios Nakos, Hung Q. Ngo, and Charalampos E. Tsourakakis. Targeted Least Cardinality Candidate Key for Relational Databases. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ICDT.2025.21

Abstract

Functional dependencies (FDs) are a central theme in databases, playing a major role in the design of database schemas and the optimization of queries [Ramakrishnan and Gehrke, 2003]. In this work, we introduce the targeted least cardinality candidate key problem (TCAND). This problem is defined over a set of functional dependencies ℱ and a target variable set T ⊆ V, and it aims to find the smallest set X ⊆ V such that the FD X → T can be derived from ℱ. The TCAND problem generalizes the well-known NP-hard problem of finding the least cardinality candidate key [Lucchesi and Osborn, 1978], which has been previously demonstrated to be at least as difficult as the set cover problem.
We present an integer programming (IP) formulation for the TCAND problem, analogous to a layered set cover problem. We analyze its linear programming (LP) relaxation from two perspectives: we propose two approximation algorithms and investigate the integrality gap. Our findings indicate that the approximation upper bounds for our algorithms are not significantly improvable through LP rounding, a notable distinction from the standard Set Cover problem. Additionally, we discover that a generalization of the TCAND problem is equivalent to a variant of the Set Cover problem, named Red Blue Set Cover [Carr et al., 2000], which cannot be approximated within a sub-polynomial factor in polynomial time under plausible conjectures [Chlamtáč et al., 2023]. Despite the extensive history surrounding the issue of identifying the least cardinality candidate key, our research contributes new theoretical insights, novel algorithms, and demonstrates that the general TCAND problem poses complexities beyond those encountered in the Set Cover problem.

Subject Classification

ACM Subject Classification
  • Information systems → Database design and models
Keywords
  • functional dependencies
  • candidate key
  • approximation algorithms
  • hardness

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Tatsuya Akutsu and Feng Bao. Approximating minimum keys and optimal substructure screens. In International Computing and Combinatorics Conference, pages 290-299. Springer, 1996. URL: https://doi.org/10.1007/3-540-61332-3_163.
  2. Michael Alekhnovich, Sam Buss, Shlomo Moran, and Toniann Pitassi. Minimum propositional proof length is NP-hard to linearly approximate. The Journal of Symbolic Logic, 66(1):171-191, 2001. URL: https://doi.org/10.2307/2694916.
  3. Gabriele Ausiello, Alessandro D’Atri, and Domenico Saccà. Minimal representations of directed hypergraphs and their application to database design. In Algorithm design for computer system design, pages 125-157. Springer, 1984. Google Scholar
  4. Giorgio Ausiello, Alessandro D'Atri, and Domenico Saccà. Graph algorithms for functional dependency manipulation. Journal of the ACM (JACM), 30(4):752-766, 1983. URL: https://doi.org/10.1145/2157.322404.
  5. Gautam Bhargava, Piyush Goel, and Bala Iyer. Efficient processing of outer joins and aggregate junctions. In Proceedings of the Twelfth International Conference on Data Engineering, pages 441-449. IEEE, 1996. Google Scholar
  6. Gautam Bhargava, Piyush Goel, and Balakrishna R Iyer. Simplification of outer joins. In Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research, page 7, 1995. URL: https://dl.acm.org/citation.cfm?id=781922.
  7. Aditya Bhaskara, Moses Charikar, Eden Chlamtáč, Uriel Feige, and Aravindan Vijayaraghavan. Detecting high log-densities: an O(n¹/4) approximation for densest k-subgraph. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 201-210, 2010. URL: https://doi.org/10.48550/arXiv.1001.2891.
  8. Robert D. Carr, Srinivas Doddi, Goran Konjevod, and Madhav Marathe. On the red-blue set cover problem. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '00, pages 345-353, USA, 2000. Society for Industrial and Applied Mathematics. URL: http://dl.acm.org/citation.cfm?id=338219.338271.
  9. Eden Chlamtáč, Michael Dinitz, and Yury Makarychev. Minimizing the union: Tight approximations for small set bipartite vertex expansion. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 881-899. SIAM, 2017. URL: https://doi.org/10.1137/1.9781611974782.56.
  10. Eden Chlamtáč, Michael Dinitz, and Robert Krauthgamer. Everywhere-sparse spanners via dense subgraphs. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 758-767. IEEE, 2012. URL: https://doi.org/10.48550/arXiv.1205.0144.
  11. Eden Chlamtáč, Yury Makarychev, and Ali Vakilian. Approximating red-blue set cover and minimum monotone satisfying assignment. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2023). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023. Google Scholar
  12. Xu Chu, Ihab F Ilyas, Paolo Papotti, and Yin Ye. Ruleminer: Data quality rules discovery. In 2014 IEEE 30th International Conference on Data Engineering, pages 1222-1225. IEEE, 2014. URL: https://doi.org/10.1109/ICDE.2014.6816746.
  13. Graham Cormode, Howard Karloff, and Anthony Wirth. Set cover algorithms for very large datasets. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 479-488, 2010. URL: https://doi.org/10.1145/1871437.1871501.
  14. Irit Dinur and David Steurer. Analytical approach to parallel repetition. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 624-633, 2014. URL: https://doi.org/10.1145/2591796.2591884.
  15. Paul Erdős, Alfréd Rényi, et al. On the evolution of random graphs. Publ. math. inst. hung. acad. sci, 5(1):17-60, 1960. Google Scholar
  16. Ronald Fagin. Functional dependencies in a relational database and propositional logic. IBM Journal of research and development, 21(6):534-544, 1977. Google Scholar
  17. Ronald Fagin and Moshe Vardi. The theory of data dependencies—an overview. In International Colloquium on Automata, Languages, and Programming, pages 1-22. Springer, 1984. Google Scholar
  18. Uriel Feige. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4):634-652, 1998. URL: https://doi.org/10.1145/285055.285059.
  19. Alan Frieze and Michał Karoński. Random graphs and networks: a first course. Cambridge University Press, 2023. Google Scholar
  20. Aristides Gionis and Charalampos E Tsourakakis. Dense subgraph discovery: KDD 2015 tutorial. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2313-2314, 2015. URL: https://doi.org/10.1145/2783258.2789987.
  21. Michael Goldwasser and Rajeev Motwani. Intractability of assembly sequencing: Unit disks in the plane. In Algorithms and Data Structures: 5th International Workshop, WADS'97 Halifax, Nova Scotia, Canada August 6-8, 1997 Proceedings 5, pages 307-320. Springer, 1997. URL: https://doi.org/10.1007/3-540-63307-3_70.
  22. András Hajnal and Endŕe Szemerédi. Proof of a conjecture of P. Erdős. Combinatorial theory and its applications, 2:601-623, 1970. Google Scholar
  23. Ihab F Ilyas, Xu Chu, et al. Trends in cleaning relational data: Consistency and deduplication. Foundations and Trendsregistered in Databases, 5(4):281-393, 2015. URL: https://doi.org/10.1561/1900000045.
  24. Richard M Karp. Reducibility among combinatorial problems. Springer, 2010. URL: https://doi.org/10.1007/978-3-540-68279-0_8.
  25. Subhash Khot and Oded Regev. Vertex cover might be hard to approximate to within 2- ε. Journal of Computer and System Sciences, 74(3):335-349, 2008. URL: https://doi.org/10.1109/CCC.2003.1214437.
  26. Subhash Khot and Nisheeth K Vishnoi. On the unique games conjecture. In FOCS, volume 5, page 3, 2005. URL: https://doi.org/10.1109/SFCS.2005.61.
  27. Hal A Kierstead and Alexandr V Kostochka. A short proof of the Hajnal-Szemerédi theorem on equitable colouring. Combinatorics, Probability and Computing, 17(2):265-270, 2008. URL: https://doi.org/10.1017/S0963548307008619.
  28. Henry A Kierstead, Alexandr V Kostochka, Marcelo Mydlarz, and Endre Szemerédi. A fast algorithm for equitable coloring. Combinatorica, 30(2):217-224, 2010. URL: https://doi.org/10.1007/s00493-010-2483-5.
  29. Mihail N Kolountzakis, Gary L Miller, Richard Peng, and Charalampos E Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1-2):161-185, 2012. URL: https://doi.org/10.1080/15427951.2012.625260.
  30. Witold Lipski. Two np-complete problems related to information retrieval. In International Conference on Fundamentals of Computation Theory, pages 452-458. Springer, 1977. URL: https://doi.org/10.1007/978-3-662-40153-8_52.
  31. László Lovász. On the ratio of optimal integral and fractional covers. Discrete mathematics, 13(4):383-390, 1975. URL: https://doi.org/10.1016/0012-365X(75)90058-8.
  32. Claudio L Lucchesi and Sylvia L Osborn. Candidate keys for relations. Journal of Computer and System Sciences, 17(2):270-279, 1978. URL: https://doi.org/10.1016/0022-0000(78)90009-0.
  33. Pasin Manurangsi. Almost-polynomial ratio eth-hardness of approximating densest k-subgraph. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 954-961, 2017. URL: https://doi.org/10.1145/3055399.3055412.
  34. Glenn N Paulley and Per-Åke Larson. Exploiting uniqueness in query optimization. In CASCON First Decade High Impact Papers, CASCON '10, pages 127-145, USA, 2010. IBM Corp. URL: https://doi.org/10.1145/1925805.1925812.
  35. Glenn Norman Paulley. Exploiting Functional Dependence in Query Optimization. PhD thesis, University of Waterloo, 2001. Google Scholar
  36. Sriram Pemmaraju and Aravind Srinivasan. The randomized coloring procedure with symmetry-breaking. In Automata, Languages and Programming: 35th International Colloquium, ICALP 2008, Reykjavik, Iceland, July 7-11, 2008, Proceedings, Part I 35, pages 306-319. Springer, 2008. URL: https://doi.org/0.1007/978-3-540-70575-8_26.
  37. Raghu Ramakrishnan and Johannes Gehrke. Database management systems, volume 3. McGraw-Hill New York, 2003. Google Scholar
  38. Hossein Saiedian and Thomas Spencer. An efficient algorithm to compute the candidate keys of a relational database schema. The Computer Journal, 39(2):124-132, 1996. URL: https://doi.org/10.1093/COMJNL/39.2.124.
  39. Petr Slavík. A tight analysis of the greedy algorithm for set cover. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 435-441, 1996. URL: https://doi.org/10.1145/237814.237991.
  40. Robert Tarjan. Depth-first search and linear graph algorithms. SIAM journal on computing, 1(2):146-160, 1972. URL: https://doi.org/10.1137/0201010.
  41. Vijay V Vazirani. Approximation algorithms, volume 1. Springer, 2001. Google Scholar
  42. David P Williamson and David B Shmoys. The design of approximation algorithms. Cambridge university press, 2011. Google Scholar
  43. Laurence A Wolsey. Integer programming. John Wiley & Sons, 2020. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail