Near Linear Time Approximation Schemes for Clustering of Partially Doubling Metrics

Driemel, Anne; Höckendorff, Jan; Psarros, Ioannis; Sohler, Christian; Yue, Di

doi:10.4230/LIPIcs.ICALP.2026.80

Abstract

In the metric k-median problem we are given a finite metric space (X∪ Y, 𝐝) and the objective is to compute a set of k centers C ⊆ Y that minimizes ∑_{p ∈ X} min_{c ∈ C} 𝐝(p,c). In general metric spaces, the best polynomial time algorithm, which is due to Cohen-Addad, Grandoni, Lee, Schwiegelshohn, and Svensson [Vincent Cohen-Addad et al., 2025], computes a (2+ε)-approximation for arbitrary constant ε > 0. However, if the metric space has bounded doubling dimension, a near linear time (1+ε)-approximation algorithm is known due to the work of Cohen-Addad, Feldmann, and Saulpic [Vincent Cohen{-}Addad et al., 2021]. 
In this paper, we show that the (1+ε)-approximation algorithm can be generalized to the case when either X or Y has bounded doubling dimension (but the other set not). The case when X has bounded doubling dimension is motivated by the assumption that even though X is part of a high-dimensional space, it may be that it is close to a low-dimensional structure. The case when Y has bounded doubling dimension is perhaps more natural. It is motivated by specific clustering problems where the centers are low-dimensional. Specifically, our work in this setting implies the first near linear time approximation algorithm for the (k,𝓁)-median problem under discrete Fréchet distance when 𝓁 is constant. The latter problem is a version of the k-median problem under Fréchet distance when the input consists of time series of z reals and where the centers are time series of 𝓁 reals [Anne Driemel et al., 2016]. Previously, for this problem no (1+ε)-approximation algorithm with running time polynomial in k was known. We also introduce a novel complexity reduction for time series of real values that leads to a similar result for the case of discrete Fréchet distance.
In order to solve the case when Y has a bounded doubling dimension, we introduce a form of dimension reduction that replaces points from X by sets of points in Y. To solve the case when X has a bounded doubling dimension, we generalize Talwar’s decomposition [Kunal Talwar, 2004] of doubling metrics to our setting. The running time of our algorithms is 2^{2^t} Õ(n+m) where t = O(ddim log ddim/ε) and where ddim is the doubling dimension of X (resp. Y). The results also extend to the metric (uncapacitated) facility location problem. We believe that our techniques are likely applicable to other problems.

Ittai Abraham, Yair Bartal, and Ofer Neiman. Advances in metric embedding theory. In Proceedings of the 38th Annual ACM Symposium on Theory of Computing, Seattle, WA, USA, May 21-23, 2006, pages 271-286. ACM, 2006. URL: https://doi.org/10.1145/1132516.1132557.
Sanjeev Arora. Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems. J. ACM, 45(5):753-782, 1998. URL: https://doi.org/10.1145/290179.290180.
Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. Approximation Schemes for Euclidean k-Medians and Related Problems. In 30th Annual ACM Symposium on the Theory of Computing, pages 106-113, 1998. URL: https://doi.org/10.1145/276698.276718.
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local Search Heuristics for k-Median and Facility Location Problems. In SIAM Journal on Computing, volume 33, pages 544-562, 2004. URL: https://doi.org/10.1137/S0097539702416402.
Kevin Buchin, Anne Driemel, Joachim Gudmundsson, Michael Horton, Irina Kostitsyna, Maarten Löffler, and Martijn Struijs. Approximating (k, 𝓁)-center clustering for curves. In 30th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2922-2938, 2019. URL: https://doi.org/10.1137/1.9781611975482.181.
Kevin Buchin, Anne Driemel, and Martijn Struijs. On the Hardness of Computing an Average Curve. In 17th Scandinavian Symposium and Workshops on Algorithm Theory, volume 162, pages 19:1-19:19, 2020. URL: https://doi.org/10.4230/LIPIcs.SWAT.2020.19.
Maike Buchin, Anne Driemel, and Dennis Rohde. Approximating (k,𝓁)-Median Clustering for Polygonal Curves. In ACM Transactions on Algorithms, volume 19, pages 4:1-4:32, 2023. URL: https://doi.org/10.1145/3559764.
Maike Buchin and Dennis Rohde. Coresets for (k,𝓁)-Median Clustering Under the Fréchet Distance. In Algorithms and Discrete Applied Mathematics, pages 167-180, 2022. URL: https://doi.org/10.1007/978-3-030-95018-7_14.
Jaroslaw Byrka and Karen Aardal. An optimal bifactor approximation algorithm for the metric uncapacitated facility location problem. SIAM Journal on Computing, 39(6):2212-2231, 2010. URL: https://doi.org/10.1137/070708901.
T.-H. Hubert Chan, Shuguang Hu, and Shaofeng H.-C. Jiang. A PTAS for the steiner forest problem in doubling metrics. SIAM J. Comput., 47(4):1705-1734, 2018. URL: https://doi.org/10.1137/16M1107206.
Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for the facility location and k-median problems. In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), pages 378-388. IEEE, 1999. URL: https://doi.org/10.1109/SFFCS.1999.814609.
Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A Constant-Factor Approximation Algorithm for the k-Median Problem. In Journal of Computer and System Sciences, volume 65, pages 129-149, 2002. URL: https://doi.org/10.1006/JCSS.2002.1882.
Moses Charikar and Shi Li. A Dependent LP-Rounding Approach for the k-Median Problem. In Automata, Languages, and Programming - 39th International Colloquium, pages 194-205, 2012. URL: https://doi.org/10.1007/978-3-642-31594-7_17.
Siu-Wing Cheng and Haoqiang Huang. Curve Simplification and Clustering under Fréchet Distance. In ACM-SIAM Symposium on Discrete Algorithms, pages 1414-1432, 2023. URL: https://doi.org/10.1137/1.9781611977554.CH51.
Vincent Cohen-Addad, Andrew Draganov, Matteo Russo, David Saulpic, and Chris Schwiegelshohn. A tight vc-dimension analysis of clustering coresets with applications. In Proceedings of the 2025 Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2025, New Orleans, LA, USA, January 12-15, 2025, pages 4783-4808. SIAM, 2025. URL: https://doi.org/10.1137/1.9781611978322.162.
Vincent Cohen-Addad, Andreas Emil Feldmann, and David Saulpic. Near-linear time approximation schemes for clustering in doubling metrics. J. ACM, 68(6):44:1-44:34, 2021. URL: https://doi.org/10.1145/3477541.
Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, Chris Schwiegelshohn, and Ola Svensson. A (2+ε)-approximation algorithm for metric k-median. In Michal Koucký and Nikhil Bansal, editors, Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, Prague, Czechia, June 23-27, 2025, pages 615-624. ACM, 2025. URL: https://doi.org/10.1145/3717823.3718299.
Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, and David Saulpic. An Improved Local Search Algorithm for k-Median. In ACM-SIAM Symposium on Discrete Algorithms, pages 1556-1612, 2022. URL: https://doi.org/10.1137/1.9781611977073.65.
Anne Driemel, Jan Höckendorff, Ioannis Psarros, Christian Sohler, and Di Yue. Near linear time approximation schemes for clustering of partially doubling metrics. CoRR, 2026. URL: https://doi.org/10.48550/arXiv.2603.24336.
Anne Driemel, Jan Höckendorff, Ioannis Psarros, and Christian Sohler. A near-linear time approximation scheme for (k,𝓁)-median clustering under discrete fréchet distance, 2025. URL: https://doi.org/10.48550/arXiv.2508.07008.
Anne Driemel, Amer Krivosija, and Christian Sohler. Clustering time series under the Fréchet distance. In 27th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 766-785, 2016. URL: https://doi.org/10.1137/1.9781611974331.ch55.
Anne Driemel, Ioannis Psarros, and Melanie Schmidt. Sublinear data structures for short Fréchet queries. In Computing Research Repository, 2019. URL: https://arxiv.org/abs/1907.04420.
Arnold Filtser and Omrit Filtser. Static and Streaming Data Structures for Fréchet Distance Queries. In ACM Transactions on Algorithms, volume 19, pages 39:1-39:36, 2023. URL: https://doi.org/10.1145/3610227.
Arnold Filtser, Omrit Filtser, and Matthew J. Katz. Approximate Nearest Neighbor for Curves: Simple, Efficient, and Deterministic. In Algorithmica, volume 85, pages 1490-1519, 2023. URL: https://doi.org/10.1007/S00453-022-01080-1.
Lee-Ad Gottlieb and Robert Krauthgamer. Proximity algorithms for nearly doubling spaces. SIAM J. Discret. Math., 27(4):1759-1769, 2013. URL: https://doi.org/10.1137/120874242.
Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. Journal of algorithms, 31(1):228-248, 1999. URL: https://doi.org/10.1006/JAGM.1998.0993.
Anupam Gupta and Kanat Tangwongsan. Simpler Analyses of Local Search Algorithms for Facility Location. In Computing Research Repository, 2008. URL: https://arxiv.org/abs/0809.2554.
Sariel Har-Peled and Nirman Kumar. Approximate nearest neighbor search for low-dimensional queries. SIAM J. Comput., 42(1):138-159, 2013. URL: https://doi.org/10.1137/110852711.
Sariel Har-Peled and Manor Mendel. Fast construction of nets in low-dimensional metrics and their applications. SIAM J. Comput., 35(5):1148-1184, 2006. URL: https://doi.org/10.1137/S0097539704446281.
Lingxiao Huang, Shaofeng H.-C. Jiang, Robert Krauthgamer, and Di Yue. Near-optimal dimension reduction for facility location. In Michal Koucký and Nikhil Bansal, editors, Proceedings of the 57th Annual ACM Symposium on Theory of Computing, STOC 2025, Prague, Czechia, June 23-27, 2025, pages 665-676. ACM, 2025. URL: https://doi.org/10.1145/3717823.3718214.
Piotr Indyk and Assaf Naor. Nearest-neighbor-preserving embeddings. ACM Trans. Algorithms, 3(3):31-es, August 2007. URL: https://doi.org/10.1145/1273340.1273347.
Kamal Jain, Mohammad Mahdian, and Amin Saberi. A new greedy approach for facility location problems. In Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC '02, pages 731-740, New York, NY, USA, 2002. Association for Computing Machinery. URL: https://doi.org/10.1145/509907.510012.
Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation. In Journal of the ACM, volume 48, pages 274-296, 2001. URL: https://doi.org/10.1145/375827.375845.
Stavros G. Kolliopoulos and Satish Rao. A Nearly Linear-Time Approximation Scheme for the Euclidean k-Median Problem. In SIAM Journal on Computing, volume 37, pages 757-782, 2007. URL: https://doi.org/10.1137/S0097539702404055.
Shi Li. A 1.488 approximation algorithm for the uncapacitated facility location problem. In Luca Aceto, Monika Henzinger, and Jirí Sgall, editors, Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part II, volume 6756 of Lecture Notes in Computer Science, pages 77-88. Springer, 2011. URL: https://doi.org/10.1007/978-3-642-22012-8_5.
Mohammad Mahdian, Yinyu Ye, and Jiawei Zhang. Approximation algorithms for metric facility location problems. SIAM Journal on Computing, 36(2):411-432, 2006. URL: https://doi.org/10.1137/S0097539703435716.
Nimrod Megiddo and Kenneth J. Supowit. On the Complexity of Some Common Geometric Location Problems. In SIAM Journal on Computing, volume 13, pages 182-196, 1984. URL: https://doi.org/10.1137/0213014.
Ramgopal R. Mettu and C. Greg Plaxton. The Online Median Problem. In SIAM Journal on Computing, volume 32, pages 816-832, 2003. URL: https://doi.org/10.1137/S0097539701383443.
Abhinandan Nath and Erin Taylor. k-Median clustering under discrete Fréchet and Hausdorff distances. In Journal of Computational Geometry, volume 12, pages 156-182, 2021. URL: https://doi.org/10.20382/JOCG.V12I2A8.
David B Shmoys, Éva Tardos, and Karen Aardal. Approximation algorithms for facility location problems. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 265-274, 1997.
Kunal Talwar. Bypassing the embedding: algorithms for low dimensional metrics. In László Babai, editor, Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13-16, 2004, pages 281-290. ACM, 2004. URL: https://doi.org/10.1145/1007352.1007399.
Mikkel Thorup. Quick k-Median, k-Center, and Facility Location for Sparse Graphs. In SIAM Journal on Computing, volume 34, pages 405-432, 2004. URL: https://doi.org/10.1137/S0097539701388884.

Near Linear Time Approximation Schemes for Clustering of Partially Doubling Metrics

Authors Anne Driemel , Jan Höckendorff , Ioannis Psarros , Christian Sohler , Di Yue

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message

Near Linear Time Approximation Schemes for Clustering of Partially Doubling Metrics

Authors Anne Driemel , Jan Höckendorff , Ioannis Psarros , Christian Sohler , Di Yue

File

Document Identifiers

Related Versions

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

Funding

References

Thanks for your feedback!

Could not send message