Search Results

Documents authored by Schwiegelshohn, Chris


Document
An Empirical Evaluation of k-Means Coresets

Authors: Chris Schwiegelshohn and Omar Ali Sheikh-Omar

Published in: LIPIcs, Volume 244, 30th Annual European Symposium on Algorithms (ESA 2022)


Abstract
Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as k-means in both theory and practice. Curiously, there exists no work on comparing the quality of available k-means coresets. In this paper we perform such an evaluation. There currently is no algorithm known to measure the distortion of a candidate coreset. We provide some evidence as to why this might be computationally difficult. To complement this, we propose a benchmark for which we argue that computing coresets is challenging and which also allows us an easy (heuristic) evaluation of coresets. Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.

Cite as

Chris Schwiegelshohn and Omar Ali Sheikh-Omar. An Empirical Evaluation of k-Means Coresets. In 30th Annual European Symposium on Algorithms (ESA 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 244, pp. 84:1-84:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{schwiegelshohn_et_al:LIPIcs.ESA.2022.84,
  author =	{Schwiegelshohn, Chris and Sheikh-Omar, Omar Ali},
  title =	{{An Empirical Evaluation of k-Means Coresets}},
  booktitle =	{30th Annual European Symposium on Algorithms (ESA 2022)},
  pages =	{84:1--84:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-247-1},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{244},
  editor =	{Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2022.84},
  URN =		{urn:nbn:de:0030-drops-170225},
  doi =		{10.4230/LIPIcs.ESA.2022.84},
  annote =	{Keywords: coresets, k-means coresets, evaluation, benchmark}
}
Document
On Finding the Jaccard Center

Authors: Marc Bury and Chris Schwiegelshohn

Published in: LIPIcs, Volume 80, 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)


Abstract
We initiate the study of finding the Jaccard center of a given collection N of sets. For two sets X,Y, the Jaccard index is defined as |X\cap Y|/|X\cup Y| and the corresponding distance is 1-|X\cap Y|/|X\cup Y|. The Jaccard center is a set C minimizing the maximum distance to any set of N. We show that the problem is NP-hard to solve exactly, and that it admits a PTAS while no FPTAS can exist unless P = NP. Furthermore, we show that the problem is fixed parameter tractable in the maximum Hamming norm between Jaccard center and any input set. Our algorithms are based on a compression technique similar in spirit to coresets for the Euclidean 1-center problem. In addition, we also show that, contrary to the previously studied median problem by Chierichetti et al. (SODA 2010), the continuous version of the Jaccard center problem admits a simple polynomial time algorithm.

Cite as

Marc Bury and Chris Schwiegelshohn. On Finding the Jaccard Center. In 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 80, pp. 23:1-23:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{bury_et_al:LIPIcs.ICALP.2017.23,
  author =	{Bury, Marc and Schwiegelshohn, Chris},
  title =	{{On Finding the Jaccard Center}},
  booktitle =	{44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)},
  pages =	{23:1--23:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-041-5},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{80},
  editor =	{Chatzigiannakis, Ioannis and Indyk, Piotr and Kuhn, Fabian and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2017.23},
  URN =		{urn:nbn:de:0030-drops-73769},
  doi =		{10.4230/LIPIcs.ICALP.2017.23},
  annote =	{Keywords: Clustering, 1-Center, Jaccard}
}
Document
Diameter and k-Center in Sliding Windows

Authors: Vincent Cohen-Addad, Chris Schwiegelshohn, and Christian Sohler

Published in: LIPIcs, Volume 55, 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)


Abstract
In this paper we develop streaming algorithms for the diameter problem and the k-center clustering problem in the sliding window model. In this model we are interested in maintaining a solution for the N most recent points of the stream. In the diameter problem we would like to maintain two points whose distance approximates the diameter of the point set in the window. Our algorithm computes a (3 + epsilon)-approximation and uses O(1/epsilon*ln(alpha)) memory cells, where alpha is the ratio of the largest and smallest distance and is assumed to be known in advance. We also prove that under reasonable assumptions obtaining a (3 - epsilon)-approximation requires Omega(N1/3) space. For the k-center problem, where the goal is to find k centers that minimize the maximum distance of a point to its nearest center, we obtain a (6 + epsilon)-approximation using O(k/epsilon*ln(alpha)) memory cells and a (4 + epsilon)-approximation for the special case k = 2. We also prove that any algorithm for the 2-center problem that achieves an approximation ratio of less than 4 requires Omega(N^{1/3}) space.

Cite as

Vincent Cohen-Addad, Chris Schwiegelshohn, and Christian Sohler. Diameter and k-Center in Sliding Windows. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 55, pp. 19:1-19:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{cohenaddad_et_al:LIPIcs.ICALP.2016.19,
  author =	{Cohen-Addad, Vincent and Schwiegelshohn, Chris and Sohler, Christian},
  title =	{{Diameter and k-Center in Sliding Windows}},
  booktitle =	{43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016)},
  pages =	{19:1--19:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-013-2},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{55},
  editor =	{Chatzigiannakis, Ioannis and Mitzenmacher, Michael and Rabani, Yuval and Sangiorgi, Davide},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2016.19},
  URN =		{urn:nbn:de:0030-drops-63401},
  doi =		{10.4230/LIPIcs.ICALP.2016.19},
  annote =	{Keywords: Streaming, k-Center, Diameter, Sliding Windows}
}
Document
The Power of Migration for Online Slack Scheduling

Authors: Chris Schwiegelshohn and Uwe Schwiegelshohn

Published in: LIPIcs, Volume 57, 24th Annual European Symposium on Algorithms (ESA 2016)


Abstract
We investigate the power of migration in online scheduling for parallel identical machines. Our objective is to maximize the total processing time of accepted jobs. Once we decide to accept a job, we have to complete it before its deadline d that satisfies d >= (1+epsilon)p + r, where p is the processing time, r the submission time and the slack epsilon > 0 a system parameter. Typically, the hard case arises for small slack epsilon << 1, i.e. for near-tight deadlines. Without migration, a greedy acceptance policy is known to be an optimal deterministic online algorithm with a competitive factor of (1+epsilon)/epsilon (DasGupta and Palis, APPROX 2000). Our first contribution is to show that migrations do not improve the competitive ratio of the greedy acceptance policy, i.e. the competitive ratio remains (1+epsilon)/epsilon for any number of machines. Our main contribution is a deterministic online algorithm with almost tight competitive ratio on any number of machines. For a single machine, the competitive factor matches the optimal bound of (1+epsilon)/epsilon of the greedy acceptance policy. The competitive ratio improves with an increasing number of machines. It approaches (1+epsilon) ln((1+epsilon)/epsilon) as the number of machines converges to infinity. This is an exponential improvement over the greedy acceptance policy for small epsilon. Moreover, we show a matching lower bound on the competitive ratio for deterministic algorithms on any number of machines.

Cite as

Chris Schwiegelshohn and Uwe Schwiegelshohn. The Power of Migration for Online Slack Scheduling. In 24th Annual European Symposium on Algorithms (ESA 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 57, pp. 75:1-75:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{schwiegelshohn_et_al:LIPIcs.ESA.2016.75,
  author =	{Schwiegelshohn, Chris and Schwiegelshohn, Uwe},
  title =	{{The Power of Migration for Online Slack Scheduling}},
  booktitle =	{24th Annual European Symposium on Algorithms (ESA 2016)},
  pages =	{75:1--75:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-015-6},
  ISSN =	{1868-8969},
  year =	{2016},
  volume =	{57},
  editor =	{Sankowski, Piotr and Zaroliagis, Christos},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2016.75},
  URN =		{urn:nbn:de:0030-drops-64162},
  doi =		{10.4230/LIPIcs.ESA.2016.75},
  annote =	{Keywords: Online scheduling, deadlines, preemption with migration, competitive analysis}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail