Matrix Completion: Approximating the Minimum Diameter

Authors Diptarka Chakraborty, Sanjana Dey

Thumbnail PDF


  • Filesize: 0.86 MB
  • 19 pages

Document Identifiers

Author Details

Diptarka Chakraborty
  • National University of Singapore, Singapore
Sanjana Dey
  • National University of Singapore, Singapore

Cite AsGet BibTex

Diptarka Chakraborty and Sanjana Dey. Matrix Completion: Approximating the Minimum Diameter. In 34th International Symposium on Algorithms and Computation (ISAAC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 283, pp. 17:1-17:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


In this paper, we focus on the matrix completion problem and aim to minimize the diameter over an arbitrary alphabet. Given a matrix M with missing entries, our objective is to complete the matrix by filling in the missing entries in a way that minimizes the maximum (Hamming) distance between any pair of rows in the completed matrix (also known as the diameter of the matrix). It is worth noting that this problem is already known to be NP-hard. Currently, the best-known upper bound is a 4-approximation algorithm derived by applying the triangle inequality together with a well-known 2-approximation algorithm for the radius minimization variant. In this work, we make the following contributions: - We present a novel 3-approximation algorithm for the diameter minimization variant of the matrix completion problem. To the best of our knowledge, this is the first approximation result that breaks below the straightforward 4-factor bound. - Furthermore, we establish that the diameter minimization variant of the matrix completion problem is (2-ε)-inapproximable, for any ε > 0, even when considering a binary alphabet, under the assumption that 𝖯 ≠ NP. This is the first result that demonstrates a hardness of approximation for this problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Approximation algorithms analysis
  • Incomplete Data
  • Matrix Completion
  • Hamming Distance
  • Diameter Minimization
  • Approximation Algorithms
  • Hardness of Approximation


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Paul D Allison. Missing data. Sage publications, 2001. Google Scholar
  2. S Arora, C Lund, R Motwani, M Sudan, and M Szegedy. Proof verification and intractability of approximation problems. In Proceedings of the 33rd Annual IEEE Symposium on the Foundations of Computer Science, IEEE, 1992. Google Scholar
  3. Per Austrin, Venkatesan Guruswami, and Johan Håstad. (2+ε)-SAT is NP-hard. SIAM Journal on Computing, 46(5):1554-1573, 2017. Google Scholar
  4. Vineet Bafna, Sorin Istrail, Giuseppe Lancia, and Romeo Rizzi. Polynomial and apx-hard cases of the individual haplotyping problem. Theoretical Computer Science, 335(1):109-125, 2005. Google Scholar
  5. Laura Balzano, Arthur Szlam, Benjamin Recht, and Robert Nowak. K-subspaces with missing data. In 2012 IEEE Statistical Signal Processing Workshop (SSP), pages 612-615. IEEE, 2012. Google Scholar
  6. Manu Basavaraju, Fahad Panolan, Ashutosh Rai, MS Ramanujan, and Saket Saurabh. On the kernelization complexity of string problems. Theoretical Computer Science, 730:21-31, 2018. Google Scholar
  7. Christina Boucher, Christine Lo, and Daniel Lokshantov. Consensus patterns (probably) has no eptas. In Algorithms-ESA 2015: 23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings, pages 239-250. Springer, 2015. Google Scholar
  8. Vladimir Braverman, Shaofeng Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for clustering with missing values. Advances in Neural Information Processing Systems, 34:17360-17372, 2021. Google Scholar
  9. Laurent Bulteau, Vincent Froese, and Rolf Niedermeier. Tight hardness results for consensus problems on circular strings and time series. SIAM Journal on Discrete Mathematics, 34(3):1854-1883, 2020. Google Scholar
  10. Laurent Bulteau, Falk Hüffner, Christian Komusiewicz, Rolf Niedermeier, et al. Multivariate algorithmics for NP-hard string problems. Bulletin of EATCS, 3(114), 2014. Google Scholar
  11. Laurent Bulteau and Markus L Schmid. Consensus strings with small maximum distance and small distance sum. Algorithmica, 82(5):1378-1409, 2020. Google Scholar
  12. Diptarka Chakraborty, Kshitij Gajjar, and Agastya Vibhuti Jha. Approximating the Center Ranking Under Ulam. In 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2021), volume 213, pages 12:1-12:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. Google Scholar
  13. Moses Charikar and Rina Panigrahy. Clustering to minimize the sum of cluster diameters. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 1-10, 2001. Google Scholar
  14. Marek Cygan, Daniel Lokshtanov, Marcin Pilipczuk, Michał Pilipczuk, and Saket Saurabh. Lower bounds for approximation schemes for closest string. arXiv preprint arXiv:1509.05809, 2015. Google Scholar
  15. Eduard Eiben, Fedor V Fomin, Petr A Golovach, William Lochet, Fahad Panolan, and Kirill Simonov. Eptas for k-means clustering of affine subspaces. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2649-2659. SIAM, 2021. Google Scholar
  16. Eduard Eiben, Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. The parameterized complexity of clustering incomplete data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7296-7304, 2021. Google Scholar
  17. Eduard Eiben, Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. Finding a cluster in incomplete data. In 30th Annual European Symposium on Algorithms (ESA 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022. Google Scholar
  18. Ehsan Elhamifar. High-rank matrix completion and clustering under self-expressive models. Advances in Neural Information Processing Systems, 29, 2016. Google Scholar
  19. Ehsan Elhamifar and René Vidal. Sparse subspace clustering: Algorithm, theory, and applications. IEEE transactions on pattern analysis and machine intelligence, 35(11):2765-2781, 2013. Google Scholar
  20. Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. Parameterized algorithms for the matrix completion problem. In International Conference on Machine Learning, pages 1656-1665. PMLR, 2018. Google Scholar
  21. Robert Ganian, Iyad Kanj, Sebastian Ordyniak, and Stefan Szeider. On the parameterized complexity of clustering incomplete data into subspaces of small rank. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3906-3913, 2020. Google Scholar
  22. Jie Gao, Michael Langberg, and Leonard J Schulman. Analysis of incomplete data and an intrinsic-dimension helly theorem. Discrete & Computational Geometry, 40:537-560, 2008. Google Scholar
  23. Jie Gao, Michael Langberg, and Leonard J Schulman. Clustering lines in high-dimensional space: Classification of incomplete data. ACM Transactions on Algorithms (TALG), 7(1):1-26, 2010. Google Scholar
  24. Leszek Gasieniec, Jesper Jansson, and Andrzej Lingas. Approximation algorithms for hamming clustering problems. Journal of Discrete Algorithms, 2(2):289-301, 2004. Google Scholar
  25. Teofilo F Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical computer science, 38:293-306, 1985. Google Scholar
  26. Jens Gramm, Rolf Niedermeier, Peter Rossmanith, et al. Fixed-parameter algorithms for closest string and related problems. Algorithmica, 37(1):25-42, 2003. Google Scholar
  27. Danny Hermelin and Liat Rozenberg. Parameterized complexity analysis for the closest string with wildcards problem. Theoretical Computer Science, 600:11-18, 2015. Google Scholar
  28. Tomohiro Koana, Vincent Froese, and Rolf Niedermeier. Parameterized algorithms for matrix completion with radius constraints. arXiv preprint arXiv:2002.00645, 2020. Google Scholar
  29. Tomohiro Koana, Vincent Froese, and Rolf Niedermeier. Binary matrix completion under diameter constraints. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021. Google Scholar
  30. Euiwoong Lee and Leonard J Schulman. Clustering affine subspaces: hardness and algorithms. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pages 810-827. SIAM, 2013. Google Scholar
  31. Ming Li, Bin Ma, and Lusheng Wang. On the closest string and substring problems. Journal of the ACM (JACM), 49(2):157-171, 2002. Google Scholar
  32. Ross Lippert, Russell Schwartz, Giuseppe Lancia, and Sorin Istrail. Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings in bioinformatics, 3(1):23-31, 2002. Google Scholar
  33. Roderick JA Little and Donald B Rubin. Statistical analysis with missing data, volume 793. John Wiley & Sons, 2019. Google Scholar
  34. Christine Lo, Boyko Kakaradov, Daniel Lokshtanov, and Christina Boucher. Seesite: characterizing relationships between splice junctions and splicing enhancers. IEEE/ACM transactions on computational biology and bioinformatics, 11(4):648-656, 2014. Google Scholar
  35. Yair Marom and Dan Feldman. k-means clustering of lines for big data. Advances in Neural Information Processing Systems, 32, 2019. Google Scholar
  36. Ran Raz. A parallel repetition theorem. In Proceedings of the twenty-seventh annual ACM symposium on Theory of computing, pages 447-456, 1995. Google Scholar
  37. Markus L Schmid. Finding consensus strings with small length difference between input and solution strings. ACM Transactions on Computation Theory (TOCT), 9(3):1-18, 2017. Google Scholar
  38. Lusheng Wang, Ming Li, and Bin Ma. Closest String and Substring Problems, pages 321-324. Springer New York, 2016. Google Scholar
  39. Jinfeng Yi, Tianbao Yang, Rong Jin, Anil K Jain, and Mehrdad Mahdavi. Robust ensemble clustering by matrix completion. In 2012 IEEE 12th international conference on data mining, pages 1176-1181. IEEE, 2012. Google Scholar