Document

**Published in:** LIPIcs, Volume 293, 40th International Symposium on Computational Geometry (SoCG 2024)

We present algorithms for the computation of ε-coresets for k-median clustering of point sequences in ℝ^d under the p-dynamic time warping (DTW) distance. Coresets under DTW have not been investigated before, and the analysis is not directly accessible to existing methods as DTW is not a metric. The three main ingredients that allow our construction of coresets are the adaptation of the ε-coreset framework of sensitivity sampling, bounds on the VC dimension of approximations to the range spaces of balls under DTW, and new approximation algorithms for the k-median problem under DTW. We achieve our results by investigating approximations of DTW that provide a trade-off between the provided accuracy and amenability to known techniques. In particular, we observe that given n curves under DTW, one can directly construct a metric that approximates DTW on this set, permitting the use of the wealth of results on metric spaces for clustering purposes. The resulting approximations are the first with polynomial running time and achieve a very similar approximation factor as state-of-the-art techniques. We apply our results to produce a practical algorithm approximating (k,𝓁)-median clustering under DTW.

Jacobus Conradi, Benedikt Kolbe, Ioannis Psarros, and Dennis Rohde. Fast Approximations and Coresets for (k,𝓁)-Median Under Dynamic Time Warping. In 40th International Symposium on Computational Geometry (SoCG 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 293, pp. 42:1-42:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)

Copy BibTex To Clipboard

@InProceedings{conradi_et_al:LIPIcs.SoCG.2024.42, author = {Conradi, Jacobus and Kolbe, Benedikt and Psarros, Ioannis and Rohde, Dennis}, title = {{Fast Approximations and Coresets for (k,𝓁)-Median Under Dynamic Time Warping}}, booktitle = {40th International Symposium on Computational Geometry (SoCG 2024)}, pages = {42:1--42:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-316-4}, ISSN = {1868-8969}, year = {2024}, volume = {293}, editor = {Mulzer, Wolfgang and Phillips, Jeff M.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2024.42}, URN = {urn:nbn:de:0030-drops-199875}, doi = {10.4230/LIPIcs.SoCG.2024.42}, annote = {Keywords: Dynamic time warping, coreset, median clustering, approximation algorithm} }

Document

**Published in:** LIPIcs, Volume 258, 39th International Symposium on Computational Geometry (SoCG 2023)

Modern time series analysis requires the ability to handle datasets that are inherently high-dimensional; examples include applications in climatology, where measurements from numerous sensors must be taken into account, or inventory tracking of large shops, where the dimension is defined by the number of tracked items. The standard way to mitigate computational issues arising from the high dimensionality of the data is by applying some dimension reduction technique that preserves the structural properties of the ambient space. The dissimilarity between two time series is often measured by "discrete" notions of distance, e.g. the dynamic time warping or the discrete Fréchet distance. Since all these distance functions are computed directly on the points of a time series, they are sensitive to different sampling rates or gaps. The continuous Fréchet distance offers a popular alternative which aims to alleviate this by taking into account all points on the polygonal curve obtained by linearly interpolating between any two consecutive points in a sequence.
We study the ability of random projections à la Johnson and Lindenstrauss to preserve the continuous Fréchet distance of polygonal curves by effectively reducing the dimension. In particular, we show that one can reduce the dimension to O(ε^{-2} log N), where N is the total number of input points while preserving the continuous Fréchet distance between any two determined polygonal curves within a factor of 1± ε. We conclude with applications on clustering.

Ioannis Psarros and Dennis Rohde. Random Projections for Curves in High Dimensions. In 39th International Symposium on Computational Geometry (SoCG 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 258, pp. 53:1-53:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{psarros_et_al:LIPIcs.SoCG.2023.53, author = {Psarros, Ioannis and Rohde, Dennis}, title = {{Random Projections for Curves in High Dimensions}}, booktitle = {39th International Symposium on Computational Geometry (SoCG 2023)}, pages = {53:1--53:15}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-273-0}, ISSN = {1868-8969}, year = {2023}, volume = {258}, editor = {Chambers, Erin W. and Gudmundsson, Joachim}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2023.53}, URN = {urn:nbn:de:0030-drops-179030}, doi = {10.4230/LIPIcs.SoCG.2023.53}, annote = {Keywords: polygonal curves, time series, dimension reduction, Johnson-Lindenstrauss lemma, Fr\'{e}chet distance} }

Document

RANDOM

**Published in:** LIPIcs, Volume 145, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)

Randomized dimensionality reduction has been recognized as one of the fundamental techniques in handling high-dimensional data. Starting with the celebrated Johnson-Lindenstrauss Lemma, such reductions have been studied in depth for the Euclidean (l_2) metric, but much less for the Manhattan (l_1) metric. Our primary motivation is the approximate nearest neighbor problem in l_1. We exploit its reduction to the decision-with-witness version, called approximate near neighbor, which incurs a roughly logarithmic overhead. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both l_2 and l_1 metrics, as well as for doubling subsets of l_2. The case that remained open were doubling subsets of l_1. In this paper, we propose a dimension reduction by means of a near neighbor-preserving embedding for doubling subsets of l_1. Our approach is to represent the pointset with a carefully chosen covering set, then randomly project the latter. We study two types of covering sets: c-approximate r-nets and randomly shifted grids, and we discuss the tradeoff between them in terms of preprocessing time and target dimension. We employ Cauchy variables: certain concentration bounds derived should be of independent interest.

Ioannis Z. Emiris, Vasilis Margonis, and Ioannis Psarros. Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l_1. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 47:1-47:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{emiris_et_al:LIPIcs.APPROX-RANDOM.2019.47, author = {Emiris, Ioannis Z. and Margonis, Vasilis and Psarros, Ioannis}, title = {{Near-Neighbor Preserving Dimension Reduction for Doubling Subsets of l\underline1}}, booktitle = {Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)}, pages = {47:1--47:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-125-2}, ISSN = {1868-8969}, year = {2019}, volume = {145}, editor = {Achlioptas, Dimitris and V\'{e}gh, L\'{a}szl\'{o} A.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.APPROX-RANDOM.2019.47}, URN = {urn:nbn:de:0030-drops-112628}, doi = {10.4230/LIPIcs.APPROX-RANDOM.2019.47}, annote = {Keywords: Approximate nearest neighbor, Manhattan metric, randomized embedding} }

Document

**Published in:** LIPIcs, Volume 129, 35th International Symposium on Computational Geometry (SoCG 2019)

The Vapnik-Chervonenkis dimension provides a notion of complexity for systems of sets. If the VC dimension is small, then knowing this can drastically simplify fundamental computational tasks such as classification, range counting, and density estimation through the use of sampling bounds. We analyze set systems where the ground set X is a set of polygonal curves in R^d and the sets {R} are metric balls defined by curve similarity metrics, such as the Fréchet distance and the Hausdorff distance, as well as their discrete counterparts. We derive upper and lower bounds on the VC dimension that imply useful sampling bounds in the setting that the number of curves is large, but the complexity of the individual curves is small. Our upper bounds are either near-quadratic or near-linear in the complexity of the curves that define the ranges and they are logarithmic in the complexity of the curves that define the ground set.

Anne Driemel, Jeff M. Phillips, and Ioannis Psarros. The VC Dimension of Metric Balls Under Fréchet and Hausdorff Distances. In 35th International Symposium on Computational Geometry (SoCG 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 129, pp. 28:1-28:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{driemel_et_al:LIPIcs.SoCG.2019.28, author = {Driemel, Anne and Phillips, Jeff M. and Psarros, Ioannis}, title = {{The VC Dimension of Metric Balls Under Fr\'{e}chet and Hausdorff Distances}}, booktitle = {35th International Symposium on Computational Geometry (SoCG 2019)}, pages = {28:1--28:16}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-104-7}, ISSN = {1868-8969}, year = {2019}, volume = {129}, editor = {Barequet, Gill and Wang, Yusu}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2019.28}, URN = {urn:nbn:de:0030-drops-104329}, doi = {10.4230/LIPIcs.SoCG.2019.28}, annote = {Keywords: VC dimension, Fr\'{e}chet distance, Hausdorff distance} }

Document

**Published in:** LIPIcs, Volume 99, 34th International Symposium on Computational Geometry (SoCG 2018)

The problem of Approximate Nearest Neighbor (ANN) search is fundamental in computer science and has benefited from significant progress in the past couple of decades. However, most work has been devoted to pointsets whereas complex shapes have not been sufficiently treated. Here, we focus on distance functions between discretized curves in Euclidean space: they appear in a wide range of applications, from road segments and molecular backbones to time-series in general dimension. For l_p-products of Euclidean metrics, for any p >= 1, we design simple and efficient data structures for ANN, based on randomized projections, which are of independent interest. They serve to solve proximity problems under a notion of distance between discretized curves, which generalizes both discrete Fréchet and Dynamic Time Warping distances. These are the most popular and practical approaches to comparing such curves. We offer the first data structures and query algorithms for ANN with arbitrarily good approximation factor, at the expense of increasing space usage and preprocessing time over existing methods. Query time complexity is comparable or significantly improved by our algorithms; our approach is especially efficient when the length of the curves is bounded.

Ioannis Z. Emiris and Ioannis Psarros. Products of Euclidean Metrics and Applications to Proximity Questions among Curves. In 34th International Symposium on Computational Geometry (SoCG 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 99, pp. 37:1-37:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{emiris_et_al:LIPIcs.SoCG.2018.37, author = {Emiris, Ioannis Z. and Psarros, Ioannis}, title = {{Products of Euclidean Metrics and Applications to Proximity Questions among Curves}}, booktitle = {34th International Symposium on Computational Geometry (SoCG 2018)}, pages = {37:1--37:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-066-8}, ISSN = {1868-8969}, year = {2018}, volume = {99}, editor = {Speckmann, Bettina and T\'{o}th, Csaba D.}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SoCG.2018.37}, URN = {urn:nbn:de:0030-drops-87504}, doi = {10.4230/LIPIcs.SoCG.2018.37}, annote = {Keywords: Approximate nearest neighbor, polygonal curves, Fr\'{e}chet distance, dynamic time warping} }

Document

**Published in:** LIPIcs, Volume 34, 31st International Symposium on Computational Geometry (SoCG 2015)

The approximate nearest neighbor problem (epsilon-ANN) in Euclidean settings is a fundamental question, which has been addressed by two main approaches: Data-dependent space partitioning techniques perform well when the dimension is relatively low, but are affected by the curse of dimensionality. On the other hand, locality sensitive hashing has polynomial dependence in the dimension, sublinear query time with an exponent inversely proportional to (1+epsilon)^2, and subquadratic space requirement.
We generalize the Johnson-Lindenstrauss Lemma to define "low-quality" mappings to a Euclidean space of significantly lower dimension, such that they satisfy a requirement weaker than approximately preserving all distances or even preserving the nearest neighbor. This mapping guarantees, with high probability, that an approximate nearest neighbor lies among the k approximate nearest neighbors in the projected space. These can be efficiently retrieved while using only linear storage by a data structure, such as BBD-trees. Our overall algorithm, given n points in dimension d, achieves space usage in O(dn), preprocessing time in O(dn log n), and query time in O(d n^{rho} log n), where rho is proportional to 1 - 1/loglog n, for fixed epsilon in (0, 1). The dimension reduction is larger if one assumes that point sets possess some structure, namely bounded expansion rate. We implement our method and present experimental results in up to 500 dimensions and 10^6 points, which show that the practical performance is better than predicted by the theoretical analysis. In addition, we compare our approach with E2LSH.

Evangelos Anagnostopoulos, Ioannis Z. Emiris, and Ioannis Psarros. Low-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor. In 31st International Symposium on Computational Geometry (SoCG 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 34, pp. 436-450, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{anagnostopoulos_et_al:LIPIcs.SOCG.2015.436, author = {Anagnostopoulos, Evangelos and Emiris, Ioannis Z. and Psarros, Ioannis}, title = {{Low-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor}}, booktitle = {31st International Symposium on Computational Geometry (SoCG 2015)}, pages = {436--450}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-939897-83-5}, ISSN = {1868-8969}, year = {2015}, volume = {34}, editor = {Arge, Lars and Pach, J\'{a}nos}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SOCG.2015.436}, URN = {urn:nbn:de:0030-drops-51181}, doi = {10.4230/LIPIcs.SOCG.2015.436}, annote = {Keywords: Approximate nearest neighbor, Randomized embeddings, Curse of dimensionality, Johnson-Lindenstrauss Lemma, Bounded expansion rate, Experimental study} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail