On Computing Centroids According to the p-Norms of Hamming Distance Vectors

Authors Jiehua Chen, Danny Hermelin, Manuel Sorge

Thumbnail PDF


  • Filesize: 0.76 MB
  • 16 pages

Document Identifiers

Author Details

Jiehua Chen
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
Danny Hermelin
  • Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer Sheva, Israel
Manuel Sorge
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland

Cite AsGet BibTex

Jiehua Chen, Danny Hermelin, and Manuel Sorge. On Computing Centroids According to the p-Norms of Hamming Distance Vectors. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 28:1-28:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s^* with (sum_{s in S} d^{p}(s^*,s))^(1/p) <=k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=infty) problem. Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and infty. Under standard complexity assumptions the reduction also implies that the problem has no 2^o(n+m)-time or 2^o(k^(p/(p+1)))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed epsilon > 0, we provide a 2^(k^(p/((p+1))+epsilon))-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Computational complexity and cryptography
  • Theory of computation → Parameterized complexity and exact algorithms
  • Theory of computation → Approximation algorithms analysis
  • Theory of computation → Facility location and clustering
  • Theory of computation → Lower bounds and information complexity
  • Theory of computation → Design and analysis of algorithms
  • Mathematics of computing → Combinatorial optimization
  • Strings
  • Clustering
  • Multiwinner Election
  • Hamming Distance


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Georgios Amanatidis, Nathanaël Barrot, Jérôme Lang, Evangelos Markakis, and Bernard Ries. Multiple referenda and multiwinner elections using hamming distances: Complexity and manipulability. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS '15), pages 715-723, 2015. Google Scholar
  2. Yossi Azar, Leah Epstein, Yossi Richter, and Gerhard J. Woeginger. All-norm approximation algorithms. Journal of Algorithms, 52(2):120-133, 2004. URL: https://doi.org/10.1016/j.jalgor.2004.02.003.
  3. Gleb Beliakov, Humberto Bustince Sola, and Tomasa Calvo. A Practical Guide to Averaging Functions, volume 329 of Studies in Fuzziness and Soft Computing. Springer, 2016. Google Scholar
  4. Paul S. Bradley, Olvi L. Mangasarian, and W. Nick Street. Clustering via concave minimization. In Proceedings of Advances in Neural Information Processing Systems 9 (NIPS '96), pages 368-374, 1996. Google Scholar
  5. Steven J. Brams, D. Marc Kilgour, and M. Remzi Sanver. A minimax procedure for negotiating multilateral treaties. In Rudolf Avenhaus and I. William Zartman, editors, Diplomacy Games: Formal Models and International Negotiations, pages 265-282. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-68304-9_14.
  6. Margaret L. Brandeau and Samuel S. Chiu. Parametric facility location on a tree network with an Lsubscriptp-norm cost function. Transportation Science, 22(1):59-69, 1988. Google Scholar
  7. Jiehua Chen, Danny Hermelin, and Manuel Sorge. A note on clustering aggregation. Technical report, arXiv:1807.08949, 2018. URL: http://arxiv.org/abs/1807.08949.
  8. Jiehua Chen, Danny Hermelin, and Manuel Sorge. On computing centroids according to the p-norms of hamming distance vectors. Technical report, arXiv:1807.06469, 2018. URL: http://arxiv.org/abs/1807.06469.
  9. Zhi-Zhong Chen, Bin Ma, and Lusheng Wang. A three-string approach to the closest string problem. Journal of Computer and System Sciences, 78(1):164-178, 2012. URL: https://doi.org/10.1016/j.jcss.2011.01.003.
  10. Sergei Chubanov. A polynomial-time descent method for separable convex optimization problems with linear constraints. SIAM Journal on Optimization, 26(1):856-889, 2016. URL: https://doi.org/10.1137/14098524X.
  11. Gérard D. Cohen, Iiro S. Honkala, Simon Litsyn, and Antoine Lobstein. Covering Codes, volume 54. North-Holland, 1997. Google Scholar
  12. Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-21275-3.
  13. Marek Cygan, Daniel Lokshtanov, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Lower bounds for approximation schemes for closest string. In Proceedings of the 15th Scandinavian Workshop on Algorithm Theory (SWAT '16), volume 53 of LIPICS, pages 12:1-12:10. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016. Google Scholar
  14. Amit Deshpande, Madhur Tulsiani, and Nisheeth K. Vishnoi. Algorithms and hardness for subspace approximation. In Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '11), pages 482-496, 2011. Google Scholar
  15. Martin Dörnfelder, Jiong Guo, Christian Komusiewicz, and Mathias Weller. On the parameterized complexity of consensus clustering. Theoretical Computer Science, 542:71-82, 2014. URL: https://doi.org/10.1016/j.tcs.2014.05.002.
  16. Piotr Faliszewski, Piotr Skowron, Akardii Slinko, and Nimrod Talmon. Committee scoring rules: Axiomatic characterization and hierarchy. ACM Transactions on Economics and Computation, 7(1):3:1-3:39, 2019. Google Scholar
  17. Piotr Faliszewski, Piotr Skowron, Arkadii Slinko, and Nimrod Talmon. Multiwinner rules on paths from k-Borda to Chamberlin-Courant. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI '17), pages 192-198. AAAI Press, 2017. Google Scholar
  18. Moti Frances and Ami Litman. On covering problems of codes. Theory of Computing Systems, 30:113-119, 1997. Google Scholar
  19. Michael R. Garey and David S. Johnson. Computers and Intractability - A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979. Google Scholar
  20. Dennis C. Ghiglia and Louis A. Romero. Minimum l_p-norm two-dimensional phase unwrapping. Journal of the Optical Society of America A, 13(10):1999-2013, 1996. Google Scholar
  21. René Gonin and Arthur H. Money. Nonlinear L_p-norm Estimation. Marcel Dekker, Inc., 1989. Google Scholar
  22. Jens Gramm, Rolf Niedermeier, and Peter Rossmanith. Fixed-parameter algorithms for Closest String and related problems. Algorithmica, 37(1):25-42, 2003. Google Scholar
  23. Venkatesan Guruswami, Prasad Raghavendra, Rishi Saket, and Yi Wu. Bypassing UGC from some optimal geometric inapproximability results. ACM Transactions on Algorithms, 12(1):6:1-6:25, 2016. Google Scholar
  24. Richard W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2), 1950. Google Scholar
  25. Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Prentice-Hall, 1988. Google Scholar
  26. D. Marc Kilgour. Approval balloting for multi-winner elections. In J.-F. Laslier and M.R. Sanver, editors, Handbook on Approval Voting, Studies in Choice and Welfare, chapter 6, pages 105-124. Springer, 2010. Google Scholar
  27. Marius Kloft, Ulf Brefeld, Sören Sonnenburg, Pavel Laskov, Klaus-Robert Müller, and Alexander Zien. Efficient and accurate l_p-norm multiple kernel learning. In Proceedings of Advances in Neural Information Processing Systems 22 (NIPS '09), pages 997-1005, 2009. Google Scholar
  28. Dušan Knop, Martin Koutecký, and Matthias Mnich. Combinatorial n-fold integer programming and applications. Technical report, arXiv:1705.08657, 2017. URL: http://arxiv.org/abs/1705.08657.
  29. J. Kevin Lanctôt, Ming Li, Bin Ma, Shaojiu Wang, and Louxin Zhang. Distinguishing string selection problems. Information and Computation, 185(1):41-55, 2003. Google Scholar
  30. Robert F. Love, J. James G. Morris, and George O. Wesolowsky. Facilities Location: Models & Methods. North-Holland, 1988. Google Scholar
  31. Bin Ma and Xiaoming Sun. More efficient algorithms for closest string and substring problems. SIAM Journal on Computing, 39(4):1432-1443, 2009. Google Scholar
  32. J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5^th Berkeley Symposium on Mathematical Statistics and Probability, pages 281-297, 1967. Google Scholar
  33. A. H. Money, J. F. Affleck-Graves, M. L. Hart, and G. D. I. Barr. The linear regression model: l_p norm estimation and the choice of p. Journal of Communications in Statistics-Simulation and Computation, 11(1):89-109, 1982. Google Scholar
  34. Yurii Nesterov and Arkadii Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. Series: Studies in Applied and Numerical Mathematics. Society for Industrial and Applied Mathematics, 1994. Google Scholar
  35. Fanny Pascual, Krzysztof Rzadca, and Piotr Skowron. Collective schedules: Scheduling meets computational social choice. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 18), pages 667-675, 2018. URL: http://dl.acm.org/citation.cfm?id=3237482.
  36. Pavel A. Pevzner. Computational Molecular Biology-An Algorithmic Approach. MIT Press, 2000. Google Scholar
  37. Ron M. Roth. Introduction to Coding Theory. Cambridge University Press, 2006. Google Scholar
  38. Douglas R. Shier and Perino M. Dearing. Optimal locations for a class of nonlinear, single-facility location problems on a network. Operations Research, 31(2):292-303, 1983. Google Scholar
  39. Shankar Sivarajan. A generalization of the minisum and minimax voting methods. SIAM Undergraduate Research Online, 11, 2018. URL: https://doi.org/10.1137/16S014870.
  40. Wen-Jun Zeng, Hing-Cheung So, and Abdelhak M. Zoubir. An 𝓁_p-norm minimization approach to time delay estimation in impulsive noise. Digital Signal Processing, 23(4):1247-1254, 2013. Google Scholar