eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2023-02-01
31:1
31:24
10.4230/LIPIcs.ITCS.2023.31
article
Clustering Permutations: New Techniques with Streaming Applications
Chakraborty, Diptarka
1
Das, Debarati
2
Krauthgamer, Robert
3
National University of Singapore, Singapore
Pennsylvania State University, University Park, PA, USA
Weizmann Institute of Science, Rehovot, Israel
We study the classical metric k-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a 2-approximate solution in polynomial time for all k = O(1), and works irrespective of the underlying distance measure, so long it is a metric; however, going below the 2-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to be permutations. For this metric, Chakraborty, Das, and Krauthgamer [SODA, 2021] provided a (2-δ)-approximation algorithm for k = 1, where δ≈ 2^{-40}.
Our primary contribution is a new algorithmic framework for clustering a set of permutations. Our first result is a 1.999-approximation algorithm for the metric k-median problem under the Ulam metric, that runs in time (k log (nd))^{O(k)} nd³ for an input consisting of n permutations over [d]. In fact, our framework is powerful enough to extend this result to the streaming model (where the n input permutations arrive one by one) using only polylogarithmic (in n) space. Additionally, we show that similar results can be obtained even in the presence of outliers, which is presumably a more difficult problem.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol251-itcs2023/LIPIcs.ITCS.2023.31/LIPIcs.ITCS.2023.31.pdf
Clustering
Approximation Algorithms
Ulam Distance
Rank Aggregation
Streaming