Document

**Published in:** LIPIcs, Volume 301, 22nd International Symposium on Experimental Algorithms (SEA 2024)

Since its introduction in 1957, Lloyd’s algorithm for k-means clustering has been extensively studied and has undergone several improvements. While in its original form it does not guarantee any approximation factor at all, Arthur and Vassilvitskii (SODA 2007) proposed k-means++ which enhances Lloyd’s algorithm by a seeding method which guarantees a 𝒪(log k)-approximation in expectation. More recently, Lattanzi and Sohler (ICML 2019) proposed LS++ which further improves the solution quality of k-means++ by local search techniques to obtain a 𝒪(1)-approximation. On the practical side, the greedy variant of k-means++ is often used although its worst-case behaviour is provably worse than for the standard k-means++ variant.
We investigate how to improve LS++ further in practice. We study two options for improving the practical performance: (a) Combining LS++ with greedy k-means++ instead of k-means++, and (b) Improving LS++ by better entangling it with Lloyd’s algorithm. Option (a) worsens the theoretical guarantees of k-means++ but improves the practical quality also in combination with LS++ as we confirm in our experiments. Option (b) is our new algorithm, Foresight LS++. We experimentally show that FLS++ improves upon the solution quality of LS++. It retains its asymptotic runtime and its worst-case approximation bounds.

Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt. Local Search k-means++ with Foresight. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{conrads_et_al:LIPIcs.SEA.2024.7, author = {Conrads, Theo and Drexler, Lukas and K\"{o}nen, Joshua and Schmidt, Daniel R. and Schmidt, Melanie}, title = {{Local Search k-means++ with Foresight}}, booktitle = {22nd International Symposium on Experimental Algorithms (SEA 2024)}, pages = {7:1--7:20}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-325-6}, ISSN = {1868-8969}, year = {2024}, volume = {301}, editor = {Liberti, Leo}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2024.7}, URN = {urn:nbn:de:0030-drops-203727}, doi = {10.4230/LIPIcs.SEA.2024.7}, annote = {Keywords: k-means clustering, kmeans++, greedy, local search} }

Document

APPROX

**Published in:** LIPIcs, Volume 145, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)

Clustering is a fundamental tool in data mining and machine learning. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters - especially if the data is already biased.
At NIPS 2017, Chierichetti et al. [Flavio Chierichetti et al., 2017] proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation for the fair k-center problem and a O(t)-approximation for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation for fair k-center [Clemens Rösner and Melanie Schmidt, 2018].
We extend and improve the known results. Firstly, we give a 5-approximation for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximations for all of the classical clustering objectives k-center, k-supplier, k-median, k-means and facility location. The latter approximations are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where the centers are already fixed.

Ioana O. Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt, and Melanie Schmidt. On the Cost of Essentially Fair Clusterings. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 18:1-18:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{bercea_et_al:LIPIcs.APPROX-RANDOM.2019.18, author = {Bercea, Ioana O. and Gro{\ss}, Martin and Khuller, Samir and Kumar, Aounon and R\"{o}sner, Clemens and Schmidt, Daniel R. and Schmidt, Melanie}, title = {{On the Cost of Essentially Fair Clusterings}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)}, pages = {18:1--18:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-125-2}, ISSN = {1868-8969}, year = {2019}, volume = {145}, editor = {Achlioptas, Dimitris and V\'{e}gh, L\'{a}szl\'{o} A.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2019.18}, URN = {urn:nbn:de:0030-drops-112337}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2019.18}, annote = {Keywords: approximation, clustering, fairness, LP rounding} }

Document

**Published in:** LIPIcs, Volume 112, 26th Annual European Symposium on Algorithms (ESA 2018)

The Steiner forest problem asks for a minimum weight forest that spans a given number of terminal sets. The problem has famous linear programming based 2-approximations [Agrawal et al., 1995; Goemans and Williamson, 1995; Jain, 2001] whose bottleneck is the fact that the most natural formulation of the problem as an integer linear program (ILP) has an integrality gap of 2. We propose new cut-based ILP formulations for the problem along with exact branch-and-bound based algorithms. While our new formulations cannot improve the integrality gap, we can prove that one of them yields stronger linear programming bounds than the two previous strongest formulations: The directed cut formulation [Balakrishnan et al., 1989; Chopra and Rao, 1994] and the advanced flow-based formulation by Magnanti and Raghavan [Magnanti and Raghavan, 2005]. In an experimental evaluation, we show that the linear programming bounds of the new formulations are indeed strong on practical instances and that our new branch-and-bound algorithms outperform branch-and-bound algorithms based on the previous formulations. Our formulations can be seen as a cut-based analogon to [Magnanti and Raghavan, 2005], whose existence was an open problem.

Daniel R. Schmidt, Bernd Zey, and François Margot. An Exact Algorithm for the Steiner Forest Problem. In 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 112, pp. 70:1-70:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{schmidt_et_al:LIPIcs.ESA.2018.70, author = {Schmidt, Daniel R. and Zey, Bernd and Margot, Fran\c{c}ois}, title = {{An Exact Algorithm for the Steiner Forest Problem}}, booktitle = {26th Annual European Symposium on Algorithms (ESA 2018)}, pages = {70:1--70:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-081-1}, ISSN = {1868-8969}, year = {2018}, volume = {112}, editor = {Azar, Yossi and Bast, Hannah and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2018.70}, URN = {urn:nbn:de:0030-drops-95339}, doi = {10.4230/LIPIcs.ESA.2018.70}, annote = {Keywords: branch-and-bound algorithms, Steiner network problems} }

Document

**Published in:** LIPIcs, Volume 94, 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)

In the Steiner Forest problem, we are given a graph and a collection of source-sink pairs, and the goal is to find a subgraph of minimum total length such that all pairs are connected. The problem is APX-Hard and can be 2-approximated by, e.g., the elegant primal-dual algorithm of Agrawal, Klein, and Ravi from 1995.
We give a local-search-based constant-factor approximation for the problem. Local search brings in new techniques to an area that has for long not seen any improvements and might be a step towards a combinatorial algorithm for the more general survivable network design problem. Moreover, local search was an essential tool to tackle the dynamic MST/Steiner Tree problem, whereas dynamic Steiner Forest is still wide open.
It is easy to see that any constant factor local search algorithm requires steps that add/drop many edges together. We propose natural local moves which, at each step, either (a) add a shortest path in the current graph and then drop a bunch of inessential edges, or (b) add a set of edges to the current solution. This second type of moves is motivated by the potential function we use to measure progress, combining the cost of the solution with a penalty for each connected component. Our carefully-chosen local moves and potential function work in tandem to eliminate bad local minima that arise when using more traditional local moves.
Our analysis first considers the case where the local optimum is a single tree, and shows optimality w.r.t. moves that add a single edge (and drop a set of edges) is enough to bound the locality gap. For the general case, we show how to "project" the optimal solution onto the different trees of the local optimum without incurring too much cost (and this argument uses optimality w.r.t. both kinds of moves), followed by a tree-by-tree argument. We hope both the potential function, and our analysis techniques will be useful to develop and analyze local-search algorithms in other contexts.

Martin Groß, Anupam Gupta, Amit Kumar, Jannik Matuschke, Daniel R. Schmidt, Melanie Schmidt, and José Verschae. A Local-Search Algorithm for Steiner Forest. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 94, pp. 31:1-31:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{gro_et_al:LIPIcs.ITCS.2018.31, author = {Gro{\ss}, Martin and Gupta, Anupam and Kumar, Amit and Matuschke, Jannik and Schmidt, Daniel R. and Schmidt, Melanie and Verschae, Jos\'{e}}, title = {{A Local-Search Algorithm for Steiner Forest}}, booktitle = {9th Innovations in Theoretical Computer Science Conference (ITCS 2018)}, pages = {31:1--31:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-060-6}, ISSN = {1868-8969}, year = {2018}, volume = {94}, editor = {Karlin, Anna R.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2018.31}, URN = {urn:nbn:de:0030-drops-83134}, doi = {10.4230/LIPIcs.ITCS.2018.31}, annote = {Keywords: Local Search, Steiner Forest, Approximation Algorithms, Network Design} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail