Finding a Cluster in Incomplete Data

Authors Eduard Eiben , Robert Ganian , Iyad Kanj , Sebastian Ordyniak , Stefan Szeider

Thumbnail PDF


  • Filesize: 0.7 MB
  • 14 pages

Document Identifiers

Author Details

Eduard Eiben
  • Department of Computer Science, Royal Holloway, University of London, Egham, UK
Robert Ganian
  • Algorithms and Complexity Group, TU Wien, Austria
Iyad Kanj
  • School of Computing, DePaul University, Chicago, IL, USA
Sebastian Ordyniak
  • University of Leeds, School of Computing, Leeds, UK
Stefan Szeider
  • Algorithms and Complexity Group, TU Wien, Austria

Cite AsGet BibTex

Eduard Eiben, Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. Finding a Cluster in Incomplete Data. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 47:1-47:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


We study two variants of the fundamental problem of finding a cluster in incomplete data. In the problems under consideration, we are given a multiset of incomplete d-dimensional vectors over the binary domain and integers k and r, and the goal is to complete the missing vector entries so that the multiset of complete vectors either contains (i) a cluster of k vectors of radius at most r, or (ii) a cluster of k vectors of diameter at most r. We give tight characterizations of the parameterized complexity of the problems under consideration with respect to the parameters k, r, and a third parameter that captures the missing vector entries.

Subject Classification

ACM Subject Classification
  • Theory of computation → Parameterized complexity and exact algorithms
  • Parameterized complexity
  • incomplete data
  • clustering


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Laura Balzano, Arthur Szlam, Benjamin Recht, and Robert D. Nowak. k-subspaces with missing data. 2012 IEEE Statistical Signal Processing Workshop (SSP), pages 612-615, 2012. Google Scholar
  2. Christina Boucher and Bin Ma. Closest string with outliers. BMC Bioinformatics, 12(S-1):S55, 2011. Google Scholar
  3. Laurent Bulteau and Markus L. Schmid. Consensus strings with small maximum distance and small distance sum. Algorithmica, 82(5):1378-1409, 2020. Google Scholar
  4. Sergio Cabello, Panos Giannopoulos, Christian Knauer, Dániel Marx, and Günter Rote. Geometric clustering: Fixed-parameter tractability and lower bounds with respect to the dimension. ACM Trans. Algorithms, 7(4):43:1-43:27, 2011. Google Scholar
  5. Moses Charikar and Rina Panigrahy. Clustering to minimize the sum of cluster diameters. Journal of Computer and System Sciences, 68(2):417-441, 2004. Google Scholar
  6. Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. Google Scholar
  7. Rodney G. Downey and Michael R. Fellows. Fundamentals of Parameterized Complexity. Texts in Computer Science. Springer, 2013. URL:
  8. M.E Dyer and A.M Frieze. A simple heuristic for the p-centre problem. Oper. Res. Lett., 3(6):285-288, 1985. Google Scholar
  9. Eduard Eiben, Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. The parameterized complexity of clustering incomplete data. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, pages 7296-7304. AAAI Press, 2021. Google Scholar
  10. Ehsan Elhamifar. High-rank matrix completion and clustering under self-expressive models. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 73-81. Curran Associates, Inc., 2016. Google Scholar
  11. Ehsan Elhamifar and René Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell., 35(11):2765-2781, 2013. Google Scholar
  12. Tomás Feder and Daniel Greene. Optimal algorithms for approximate clustering. In Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, STOC '88, pages 434-444. ACM, 1988. Google Scholar
  13. M. Frances and A. Litman. On covering problems of codes. Theory of Computing Systems, 30(2):113-119, 1997. Google Scholar
  14. Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. Parameterized algorithms for the matrix completion problem. In ICML, volume 80 of JMLR Workshop and Conference Proceedings, pages 1642-1651, 2018. Google Scholar
  15. Leszek Gasieniec, Jesper Jansson, and Andrzej Lingas. Efficient approximation algorithms for the Hamming center problem. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 905-906, 1999. Google Scholar
  16. Leszek Gasieniec, Jesper Jansson, and Andrzej Lingas. Approximation algorithms for Hamming clustering problems. Journal of Discrete Algorithms, 2(2):289-301, 2004. Google Scholar
  17. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293-306, 1985. Google Scholar
  18. Jens Gramm, Rolf Niedermeier, and Peter Rossmanith. Fixed-parameter algorithms for CLOSEST STRING and related problems. Algorithmica, 37(1):25-42, 2003. Google Scholar
  19. Danny Hermelin and Liat Rozenberg. Parameterized complexity analysis for the closest string with wildcards problem. Theoretical Computer Science, 600:11-18, 2015. Google Scholar
  20. Tomohiro Koana, Vincent Froese, and Rolf Niedermeier. Parameterized algorithms for matrix completion with radius constraints. In Inge Li Gørtz and Oren Weimann, editors, 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, June 17-19, 2020, Copenhagen, Denmark, volume 161 of LIPIcs, pages 20:1-20:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. Google Scholar
  21. Tomohiro Koana, Vincent Froese, and Rolf Niedermeier. Binary matrix completion under diameter constraints. In Markus Bläser and Benjamin Monmege, editors, 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), volume 187 of LIPIcs, pages 47:1-47:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  22. Stefan Kratsch, Dániel Marx, and Magnus Wahlström. Parameterized complexity and kernelizability of max ones and exact ones problems. TOCT, 8(1):1:1-1:28, 2016. Google Scholar
  23. H. W. Lenstra and Jr. Integer programming with a fixed number of variables. Math. Oper. Res., 8(4):538-548, 1983. Google Scholar
  24. Ming Li, Bin Ma, and Lusheng Wang. On the closest string and substring problems. J. ACM, 49(2):157-171, 2002. Google Scholar
  25. Dániel Marx. Parameterized complexity of constraint satisfaction problems. Computational Complexity, 14(2):153-183, 2005. Google Scholar
  26. Dániel Marx. Closest substring problems with small distances. SIAM J. Comput., 38(4):1382-1410, 2008. Google Scholar
  27. Dániel Marx. Parameterized complexity and approximation algorithms. Comput. J., 51(1):60-78, 2008. Google Scholar
  28. J. Yi, T. Yang, R. Jin, A. K. Jain, and M. Mahdavi. Robust ensemble clustering by matrix completion. In 2012 IEEE 12th International Conference on Data Mining, pages 1176-1181, 2012. Google Scholar