Document

**Published in:** LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)

Hierarchical Clustering is a popular tool for understanding the hereditary properties of a data set. Such a clustering is actually a sequence of clusterings that starts with the trivial clustering in which every data point forms its own cluster and then successively merges two existing clusters until all points are in the same cluster. A hierarchical clustering achieves an approximation factor of α if the costs of each k-clustering in the hierarchy are at most α times the costs of an optimal k-clustering. We study as cost functions the maximum (discrete) radius of any cluster (k-center problem) and the maximum diameter of any cluster (k-diameter problem).
In general, the optimal clusterings do not form a hierarchy and hence an approximation factor of 1 cannot be achieved. We call the smallest approximation factor that can be achieved for any instance the price of hierarchy. For the k-diameter problem we improve the upper bound on the price of hierarchy to 3+2√2≈ 5.83. Moreover we significantly improve the lower bounds for k-center and k-diameter, proving a price of hierarchy of exactly 4 and 3+2√2, respectively.

Anna Arutyunova and Heiko Röglin. The Price of Hierarchical Clustering. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 10:1-10:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{arutyunova_et_al:LIPIcs.ESA.2022.10, author = {Arutyunova, Anna and R\"{o}glin, Heiko}, title = {{The Price of Hierarchical Clustering}}, booktitle = {30th Annual European Symposium on Algorithms (ESA 2022)}, pages = {10:1--10:14}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-247-1}, ISSN = {1868-8969}, year = {2022}, volume = {244}, editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.10}, URN = {urn:nbn:de:0030-drops-169487}, doi = {10.4230/LIPIcs.ESA.2022.10}, annote = {Keywords: Hierarchical Clustering, approximation Algorithms, k-center Problem} }

Document

**Published in:** LIPIcs, Volume 224, 38th International Symposium on Computational Geometry (SoCG 2022)

We apply state-of-the-art computational geometry methods to the problem of reconstructing a time-varying sea surface from tide gauge records. Our work builds on a recent article by Nitzke et al. (Computers & Geosciences, 157:104920, 2021) who have suggested to learn a triangulation D of a given set of tide gauge stations. The objective is to minimize the misfit of the piecewise linear surface induced by D to a reference surface that has been acquired with satellite altimetry. The authors restricted their search to k-order Delaunay (k-OD) triangulations and used an integer linear program in order to solve the resulting optimization problem.
In geometric terms, the input to our problem consists of two sets of points in ℝ² with elevations: a set 𝒮 that is to be triangulated, and a set ℛ of reference points. Intuitively, we define the error of a triangulation as the average vertical distance of a point in ℛ to the triangulated surface that is obtained by interpolating elevations of 𝒮 linearly in each triangle. Our goal is to find the triangulation of 𝒮 that has minimum error with respect to ℛ.
In our work, we prove that the minimum-error triangulation problem is NP-hard and cannot be approximated within any multiplicative factor in polynomial time unless P = NP. At the same time we show that the problem instances that occur in our application (considering sea level data from several hundreds of tide gauge stations worldwide) can be solved relatively fast using dynamic programming when restricted to k-OD triangulations for k ≤ 7. In particular, instances for which the number of connected components of the so-called k-OD fixed-edge graph is small can be solved within few seconds.

Anna Arutyunova, Anne Driemel, Jan-Henrik Haunert, Herman Haverkort, Jürgen Kusche, Elmar Langetepe, Philip Mayer, Petra Mutzel, and Heiko Röglin. Minimum-Error Triangulations for Sea Surface Reconstruction. In 38th International Symposium on Computational Geometry (SoCG 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 224, pp. 7:1-7:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{arutyunova_et_al:LIPIcs.SoCG.2022.7, author = {Arutyunova, Anna and Driemel, Anne and Haunert, Jan-Henrik and Haverkort, Herman and Kusche, J\"{u}rgen and Langetepe, Elmar and Mayer, Philip and Mutzel, Petra and R\"{o}glin, Heiko}, title = {{Minimum-Error Triangulations for Sea Surface Reconstruction}}, booktitle = {38th International Symposium on Computational Geometry (SoCG 2022)}, pages = {7:1--7:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-227-3}, ISSN = {1868-8969}, year = {2022}, volume = {224}, editor = {Goaoc, Xavier and Kerber, Michael}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2022.7}, URN = {urn:nbn:de:0030-drops-160155}, doi = {10.4230/LIPIcs.SoCG.2022.7}, annote = {Keywords: Minimum-Error Triangulation, k-Order Delaunay Triangulations, Data dependent Triangulations, Sea Surface Reconstruction, fixed-Edge Graph} }

Document

APPROX

**Published in:** LIPIcs, Volume 207, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)

In a hierarchical clustering problem the task is to compute a series of mutually compatible clusterings of a finite metric space (P,dist). Starting with the clustering where every point forms its own cluster, one iteratively merges two clusters until only one cluster remains. Complete linkage is a well-known and popular algorithm to compute such clusterings: in every step it merges the two clusters whose union has the smallest radius (or diameter) among all currently possible merges. We prove that the radius (or diameter) of every k-clustering computed by complete linkage is at most by factor O(k) (or O(k²)) worse than an optimal k-clustering minimizing the radius (or diameter). Furthermore we give a negative answer to the question proposed by Dasgupta and Long [Sanjoy Dasgupta and Philip M. Long, 2005], who show a lower bound of Ω(log(k)) and ask if the approximation guarantee is in fact Θ(log(k)). We present instances where complete linkage performs poorly in the sense that the k-clustering computed by complete linkage is off by a factor of Ω(k) from an optimal solution for radius and diameter. We conclude that in general metric spaces complete linkage does not perform asymptotically better than single linkage, merging the two clusters with smallest inter-cluster distance, for which we prove an approximation guarantee of O(k).

Anna Arutyunova, Anna Großwendt, Heiko Röglin, Melanie Schmidt, and Julian Wargalla. Upper and Lower Bounds for Complete Linkage in General Metric Spaces. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 207, pp. 18:1-18:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{arutyunova_et_al:LIPIcs.APPROX/RANDOM.2021.18, author = {Arutyunova, Anna and Gro{\ss}wendt, Anna and R\"{o}glin, Heiko and Schmidt, Melanie and Wargalla, Julian}, title = {{Upper and Lower Bounds for Complete Linkage in General Metric Spaces}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)}, pages = {18:1--18:22}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-207-5}, ISSN = {1868-8969}, year = {2021}, volume = {207}, editor = {Wootters, Mary and Sanit\`{a}, Laura}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX/RANDOM.2021.18}, URN = {urn:nbn:de:0030-drops-147115}, doi = {10.4230/LIPIcs.APPROX/RANDOM.2021.18}, annote = {Keywords: Hierarchical Clustering, Complete Linkage, agglomerative Clustering, k-Center} }

Document

**Published in:** LIPIcs, Volume 187, 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)

We study k-clustering problems with lower bounds, including k-median and k-means clustering with lower bounds. In addition to the point set P and the number of centers k, a k-clustering problem with (uniform) lower bounds gets a number B. The solution space is restricted to clusterings where every cluster has at least B points. We demonstrate how to approximate k-median with lower bounds via a reduction to facility location with lower bounds, for which O(1)-approximation algorithms are known.
Then we propose a new constrained clustering problem with lower bounds where we allow points to be assigned multiple times (to different centers). This means that for every point, the clustering specifies a set of centers to which it is assigned. We call this clustering with weak lower bounds. We give an 8-approximation for k-median clustering with weak lower bounds and an O(1)-approximation for k-means with weak lower bounds.
We conclude by showing that at a constant increase in the approximation factor, we can restrict the number of assignments of every point to 2 (or, if we allow fractional assignments, to 1+ε). This also leads to the first bicritera approximation algorithm for k-means with (standard) lower bounds where bicriteria is interpreted in the sense that the lower bounds are violated by a constant factor.
All algorithms in this paper run in time that is polynomial in n and k (and d for the Euclidean variants considered).

Anna Arutyunova and Melanie Schmidt. Achieving Anonymity via Weak Lower Bound Constraints for k-Median and k-Means. In 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 187, pp. 7:1-7:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{arutyunova_et_al:LIPIcs.STACS.2021.7, author = {Arutyunova, Anna and Schmidt, Melanie}, title = {{Achieving Anonymity via Weak Lower Bound Constraints for k-Median and k-Means}}, booktitle = {38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)}, pages = {7:1--7:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-180-1}, ISSN = {1868-8969}, year = {2021}, volume = {187}, editor = {Bl\"{a}ser, Markus and Monmege, Benjamin}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.STACS.2021.7}, URN = {urn:nbn:de:0030-drops-136529}, doi = {10.4230/LIPIcs.STACS.2021.7}, annote = {Keywords: Clustering with Constraints, lower Bounds, k-Means, Anonymity} }