Targeted Least Cardinality Candidate Key for Relational Databases

Authors Vasileios Nakos , Hung Q. Ngo , Charalampos E. Tsourakakis

Author Details

Vasileios Nakos
  • National and Kapodistrian University of Athens, Greece
  • Archimedes Athena RC, Marousi, Greece
Hung Q. Ngo
  • RelationalAI, Berkeley, CA, USA
Charalampos E. Tsourakakis
  • RelationalAI, Berkeley, CA, USA

Vasileios Nakos, Hung Q. Ngo, and Charalampos E. Tsourakakis. Targeted Least Cardinality Candidate Key for Relational Databases. In 28th International Conference on Database Theory (ICDT 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 328, pp. 21:1-21:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Functional dependencies (FDs) are a central theme in databases, playing a major role in the design of database schemas and the optimization of queries [Ramakrishnan and Gehrke, 2003]. In this work, we introduce the targeted least cardinality candidate key problem (TCAND). This problem is defined over a set of functional dependencies ℱ and a target variable set T ⊆ V, and it aims to find the smallest set X ⊆ V such that the FD X → T can be derived from ℱ. The TCAND problem generalizes the well-known NP-hard problem of finding the least cardinality candidate key [Lucchesi and Osborn, 1978], which has been previously demonstrated to be at least as difficult as the set cover problem.
We present an integer programming (IP) formulation for the TCAND problem, analogous to a layered set cover problem. We analyze its linear programming (LP) relaxation from two perspectives: we propose two approximation algorithms and investigate the integrality gap. Our findings indicate that the approximation upper bounds for our algorithms are not significantly improvable through LP rounding, a notable distinction from the standard Set Cover problem. Additionally, we discover that a generalization of the TCAND problem is equivalent to a variant of the Set Cover problem, named Red Blue Set Cover [Carr et al., 2000], which cannot be approximated within a sub-polynomial factor in polynomial time under plausible conjectures [Chlamtáč et al., 2023]. Despite the extensive history surrounding the issue of identifying the least cardinality candidate key, our research contributes new theoretical insights, novel algorithms, and demonstrates that the general TCAND problem poses complexities beyond those encountered in the Set Cover problem.

Subject Classification

ACM Subject Classification
  • Information systems → Database design and models
  • functional dependencies
  • candidate key
  • approximation algorithms
  • hardness


