On Computing Centroids According to the p-Norms of Hamming Distance Vectors

Chen, Jiehua; Hermelin, Danny; Sorge, Manuel

doi:10.4230/LIPIcs.ESA.2019.28

File

LIPIcs.ESA.2019.28.pdf

Filesize: 0.76 MB
16 pages

Document Identifiers

DOI: 10.4230/LIPIcs.ESA.2019.28
URN: urn:nbn:de:0030-drops-111495

Author Details

Jiehua Chen

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland

Danny Hermelin

Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer Sheva, Israel

Manuel Sorge

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland

Cite AsGet BibTex

Jiehua Chen, Danny Hermelin, and Manuel Sorge. On Computing Centroids According to the p-Norms of Hamming Distance Vectors. In 27th Annual European Symposium on Algorithms (ESA 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 144, pp. 28:1-28:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.ESA.2019.28

Abstract

In this paper we consider the p-Norm Hamming Centroid problem which asks to determine whether some given strings have a centroid with a bound on the p-norm of its Hamming distances to the strings. Specifically, given a set S of strings and a real k, we consider the problem of determining whether there exists a string s^* with (sum_{s in S} d^{p}(s^*,s))^(1/p) <=k, where d(,) denotes the Hamming distance metric. This problem has important applications in data clustering and multi-winner committee elections, and is a generalization of the well-known polynomial-time solvable Consensus String (p=1) problem, as well as the NP-hard Closest String (p=infty) problem. Our main result shows that the problem is NP-hard for all fixed rational p > 1, closing the gap for all rational values of p between 1 and infty. Under standard complexity assumptions the reduction also implies that the problem has no 2^o(n+m)-time or 2^o(k^(p/(p+1)))-time algorithm, where m denotes the number of input strings and n denotes the length of each string, for any fixed p > 1. The first bound matches a straightforward brute-force algorithm. The second bound is tight in the sense that for each fixed epsilon > 0, we provide a 2^(k^(p/((p+1))+epsilon))-time algorithm. In the last part of the paper, we complement our hardness result by presenting a fixed-parameter algorithm and a factor-2 approximation algorithm for the problem.

Subject Classification

ACM Subject Classification

Theory of computation → Computational complexity and cryptography
Theory of computation → Parameterized complexity and exact algorithms
Theory of computation → Approximation algorithms analysis
Theory of computation → Facility location and clustering
Theory of computation → Lower bounds and information complexity
Theory of computation → Design and analysis of algorithms
Mathematics of computing → Combinatorial optimization

Keywords

Strings
Clustering
Multiwinner Election
Hamming Distance

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Georgios Amanatidis, Nathanaël Barrot, Jérôme Lang, Evangelos Markakis, and Bernard Ries. Multiple referenda and multiwinner elections using hamming distances: Complexity and manipulability. In Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS '15), pages 715-723, 2015.
Yossi Azar, Leah Epstein, Yossi Richter, and Gerhard J. Woeginger. All-norm approximation algorithms. Journal of Algorithms, 52(2):120-133, 2004. URL: https://doi.org/10.1016/j.jalgor.2004.02.003.
Gleb Beliakov, Humberto Bustince Sola, and Tomasa Calvo. A Practical Guide to Averaging Functions, volume 329 of Studies in Fuzziness and Soft Computing. Springer, 2016.
Paul S. Bradley, Olvi L. Mangasarian, and W. Nick Street. Clustering via concave minimization. In Proceedings of Advances in Neural Information Processing Systems 9 (NIPS '96), pages 368-374, 1996.
Steven J. Brams, D. Marc Kilgour, and M. Remzi Sanver. A minimax procedure for negotiating multilateral treaties. In Rudolf Avenhaus and I. William Zartman, editors, Diplomacy Games: Formal Models and International Negotiations, pages 265-282. Springer, 2007. URL: https://doi.org/10.1007/978-3-540-68304-9_14.
Margaret L. Brandeau and Samuel S. Chiu. Parametric facility location on a tree network with an Lsubscriptp-norm cost function. Transportation Science, 22(1):59-69, 1988.
Jiehua Chen, Danny Hermelin, and Manuel Sorge. A note on clustering aggregation. Technical report, arXiv:1807.08949, 2018. URL: http://arxiv.org/abs/1807.08949.
Jiehua Chen, Danny Hermelin, and Manuel Sorge. On computing centroids according to the p-norms of hamming distance vectors. Technical report, arXiv:1807.06469, 2018. URL: http://arxiv.org/abs/1807.06469.
Zhi-Zhong Chen, Bin Ma, and Lusheng Wang. A three-string approach to the closest string problem. Journal of Computer and System Sciences, 78(1):164-178, 2012. URL: https://doi.org/10.1016/j.jcss.2011.01.003.
Sergei Chubanov. A polynomial-time descent method for separable convex optimization problems with linear constraints. SIAM Journal on Optimization, 26(1):856-889, 2016. URL: https://doi.org/10.1137/14098524X.
Gérard D. Cohen, Iiro S. Honkala, Simon Litsyn, and Antoine Lobstein. Covering Codes, volume 54. North-Holland, 1997.
Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-21275-3.
Marek Cygan, Daniel Lokshtanov, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Lower bounds for approximation schemes for closest string. In Proceedings of the 15th Scandinavian Workshop on Algorithm Theory (SWAT '16), volume 53 of LIPICS, pages 12:1-12:10. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.
Amit Deshpande, Madhur Tulsiani, and Nisheeth K. Vishnoi. Algorithms and hardness for subspace approximation. In Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '11), pages 482-496, 2011.
Martin Dörnfelder, Jiong Guo, Christian Komusiewicz, and Mathias Weller. On the parameterized complexity of consensus clustering. Theoretical Computer Science, 542:71-82, 2014. URL: https://doi.org/10.1016/j.tcs.2014.05.002.
Piotr Faliszewski, Piotr Skowron, Akardii Slinko, and Nimrod Talmon. Committee scoring rules: Axiomatic characterization and hierarchy. ACM Transactions on Economics and Computation, 7(1):3:1-3:39, 2019.
Piotr Faliszewski, Piotr Skowron, Arkadii Slinko, and Nimrod Talmon. Multiwinner rules on paths from k-Borda to Chamberlin-Courant. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI '17), pages 192-198. AAAI Press, 2017.
Moti Frances and Ami Litman. On covering problems of codes. Theory of Computing Systems, 30:113-119, 1997.
Michael R. Garey and David S. Johnson. Computers and Intractability - A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979.
Dennis C. Ghiglia and Louis A. Romero. Minimum l_p-norm two-dimensional phase unwrapping. Journal of the Optical Society of America A, 13(10):1999-2013, 1996.
René Gonin and Arthur H. Money. Nonlinear L_p-norm Estimation. Marcel Dekker, Inc., 1989.
Jens Gramm, Rolf Niedermeier, and Peter Rossmanith. Fixed-parameter algorithms for Closest String and related problems. Algorithmica, 37(1):25-42, 2003.
Venkatesan Guruswami, Prasad Raghavendra, Rishi Saket, and Yi Wu. Bypassing UGC from some optimal geometric inapproximability results. ACM Transactions on Algorithms, 12(1):6:1-6:25, 2016.
Richard W. Hamming. Error detecting and error correcting codes. The Bell System Technical Journal, 29(2), 1950.
Anil K. Jain and Richard C. Dubes. Algorithms for Clustering Data. Prentice-Hall, 1988.
D. Marc Kilgour. Approval balloting for multi-winner elections. In J.-F. Laslier and M.R. Sanver, editors, Handbook on Approval Voting, Studies in Choice and Welfare, chapter 6, pages 105-124. Springer, 2010.
Marius Kloft, Ulf Brefeld, Sören Sonnenburg, Pavel Laskov, Klaus-Robert Müller, and Alexander Zien. Efficient and accurate l_p-norm multiple kernel learning. In Proceedings of Advances in Neural Information Processing Systems 22 (NIPS '09), pages 997-1005, 2009.
Dušan Knop, Martin Koutecký, and Matthias Mnich. Combinatorial n-fold integer programming and applications. Technical report, arXiv:1705.08657, 2017. URL: http://arxiv.org/abs/1705.08657.
J. Kevin Lanctôt, Ming Li, Bin Ma, Shaojiu Wang, and Louxin Zhang. Distinguishing string selection problems. Information and Computation, 185(1):41-55, 2003.
Robert F. Love, J. James G. Morris, and George O. Wesolowsky. Facilities Location: Models & Methods. North-Holland, 1988.
Bin Ma and Xiaoming Sun. More efficient algorithms for closest string and substring problems. SIAM Journal on Computing, 39(4):1432-1443, 2009.
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of 5^th Berkeley Symposium on Mathematical Statistics and Probability, pages 281-297, 1967.
A. H. Money, J. F. Affleck-Graves, M. L. Hart, and G. D. I. Barr. The linear regression model: l_p norm estimation and the choice of p. Journal of Communications in Statistics-Simulation and Computation, 11(1):89-109, 1982.
Yurii Nesterov and Arkadii Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. Series: Studies in Applied and Numerical Mathematics. Society for Industrial and Applied Mathematics, 1994.
Fanny Pascual, Krzysztof Rzadca, and Piotr Skowron. Collective schedules: Scheduling meets computational social choice. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 18), pages 667-675, 2018. URL: http://dl.acm.org/citation.cfm?id=3237482.
Pavel A. Pevzner. Computational Molecular Biology-An Algorithmic Approach. MIT Press, 2000.
Ron M. Roth. Introduction to Coding Theory. Cambridge University Press, 2006.
Douglas R. Shier and Perino M. Dearing. Optimal locations for a class of nonlinear, single-facility location problems on a network. Operations Research, 31(2):292-303, 1983.
Shankar Sivarajan. A generalization of the minisum and minimax voting methods. SIAM Undergraduate Research Online, 11, 2018. URL: https://doi.org/10.1137/16S014870.
Wen-Jun Zeng, Hing-Cheung So, and Abdelhak M. Zoubir. An 𝓁_p-norm minimization approach to time delay estimation in impulsive noise. Digital Signal Processing, 23(4):1247-1254, 2013.

On Computing Centroids According to the p-Norms of Hamming Distance Vectors

Authors Jiehua Chen, Danny Hermelin, Manuel Sorge

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

On Computing Centroids According to the p-Norms of Hamming Distance Vectors

Authors Jiehua Chen, Danny Hermelin, Manuel Sorge

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message