Fast Approximations and Coresets for (k,𝓁)-Median Under Dynamic Time Warping

Authors Jacobus Conradi , Benedikt Kolbe , Ioannis Psarros , Dennis Rohde



PDF
Thumbnail PDF

File

LIPIcs.SoCG.2024.42.pdf
  • Filesize: 1 MB
  • 17 pages

Document Identifiers

Author Details

Jacobus Conradi
  • University of Bonn, Germany
Benedikt Kolbe
  • Hausdorff Center for Mathematics, University of Bonn, Germany
Ioannis Psarros
  • Archimedes, Athena Research Center, Greece
Dennis Rohde
  • University of Bonn, Germany

Acknowledgements

We thank Anne Driemel of the University of Bonn for detailed discussion and guidance.

Cite AsGet BibTex

Jacobus Conradi, Benedikt Kolbe, Ioannis Psarros, and Dennis Rohde. Fast Approximations and Coresets for (k,𝓁)-Median Under Dynamic Time Warping. In 40th International Symposium on Computational Geometry (SoCG 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 293, pp. 42:1-42:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SoCG.2024.42

Abstract

We present algorithms for the computation of ε-coresets for k-median clustering of point sequences in ℝ^d under the p-dynamic time warping (DTW) distance. Coresets under DTW have not been investigated before, and the analysis is not directly accessible to existing methods as DTW is not a metric. The three main ingredients that allow our construction of coresets are the adaptation of the ε-coreset framework of sensitivity sampling, bounds on the VC dimension of approximations to the range spaces of balls under DTW, and new approximation algorithms for the k-median problem under DTW. We achieve our results by investigating approximations of DTW that provide a trade-off between the provided accuracy and amenability to known techniques. In particular, we observe that given n curves under DTW, one can directly construct a metric that approximates DTW on this set, permitting the use of the wealth of results on metric spaces for clustering purposes. The resulting approximations are the first with polynomial running time and achieve a very similar approximation factor as state-of-the-art techniques. We apply our results to produce a practical algorithm approximating (k,𝓁)-median clustering under DTW.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Dynamic time warping
  • coreset
  • median clustering
  • approximation algorithm

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Waleed H. Abdulla, David Chow, and Gary Sin. Cross-words reference template for dtw-based speech recognition systems. In TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, volume 4, pages 1576-1579 Vol.4, 2003. Google Scholar
  2. Marcel R. Ackermann, Johannes Blömer, and Christian Sohler. Clustering for metric and nonmetric distance measures. ACM Transactions on Algorithms, 6(4):59:1-59:26, 2010. Google Scholar
  3. Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. URL: https://doi.org/10.1017/CBO9780511624216.
  4. Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local Search Heuristics for k-Median and Facility Location Problems. SIAM Journal on Computing, 33(3):544-562, 2004. Google Scholar
  5. Milutin Brankovic, Kevin Buchin, Koen Klaren, André Nusser, Aleksandr Popov, and Sampson Wong. (k, l)-Medians Clustering of Trajectories Using Continuous Dynamic Time Warping. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, volume 1, pages 99-110, New York, NY, USA, November 2020. ACM. URL: https://doi.org/10.1145/3397536.3422245.
  6. Vladimir Braverman, Vincent Cohen-Addad, Shaofeng H.-C. Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, and Xuan Wu. The power of uniform sampling for coresets. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 462-473. IEEE, 2022. Google Scholar
  7. Vladimir Braverman, Vincent Cohen-Addad, Shaofeng H.-C. Jiang, Robert Krauthgamer, Chris Schwiegelshohn, Mads Bech Toftrup, and Xuan Wu. The power of uniform sampling for coresets. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 462-473. IEEE, 2022. URL: https://doi.org/10.1109/FOCS54457.2022.00051.
  8. Markus Brill, Till Fluschnik, Vincent Froese, Brijnesh J. Jain, Rolf Niedermeier, and David Schultz. Exact mean computation in dynamic time warping spaces. Data Min. Knowl. Discov., 33(1):252-291, 2019. Google Scholar
  9. Kevin Buchin, Anne Driemel, Joachim Gudmundsson, Michael Horton, Irina Kostitsyna, Maarten Löffler, and Martijn Struijs. Approximating (k, 𝓁)-center clustering for curves. In Timothy M. Chan, editor, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 2922-2938, San Diego, California, USA, January 2019. SIAM. Google Scholar
  10. Kevin Buchin, Anne Driemel, and Martijn Struijs. On the hardness of computing an average curve. In 17th Scandinavian Symposium and Workshops on Algorithm Theory, SWAT 2020, June 22-24, 2020, Tórshavn, Faroe Islands, pages 19:1-19:19, 2020. Google Scholar
  11. Kevin Buchin, Anne Driemel, and Martijn Struijs. On the Hardness of Computing an Average Curve. In Susanne Albers, editor, 17th Scandinavian Symposium and Workshops on Algorithm Theory, volume 162 of LIPIcs, pages 19:1-19:19, Tórshavn, Faroe Islands, June 2020. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. Google Scholar
  12. Kevin Buchin, Anne Driemel, Natasja van de L'Isle, and André Nusser. klcluster: Center-based Clustering of Trajectories. In Proceedings of the 27superscriptth ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 496-499, 2019. Google Scholar
  13. Maike Buchin, Anne Driemel, and Dennis Rohde. Approximating (k,𝓁)-median clustering for polygonal curves. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 2697-2717. SIAM, 2021. Google Scholar
  14. Maike Buchin, Anne Driemel, and Dennis Rohde. Approximating (k,𝓁)-Median Clustering for Polygonal Curves. In Dániel Marx, editor, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 2697-2717, Virtual Conference, January 2021. SIAM. Google Scholar
  15. Maike Buchin, Anne Driemel, and Dennis Rohde. Approximating (k,𝓁)-median clustering for polygonal curves. ACM Trans. Algorithms, 19(1):4:1-4:32, 2023. Google Scholar
  16. Maike Buchin, Anne Driemel, Koen van Greevenbroek, Ioannis Psarros, and Dennis Rohde. Approximating length-restricted means under dynamic time warping. In Parinya Chalermsook and Bundit Laekhanukit, editors, Approximation and Online Algorithms - 20th International Workshop, WAOA 2022, Potsdam, Germany, September 8-9, 2022, Proceedings, volume 13538 of Lecture Notes in Computer Science, pages 225-253. Springer, 2022. Google Scholar
  17. Maike Buchin and Dennis Rohde. Coresets for (k, 𝓁)-Median Clustering Under the Fréchet Distance. In Niranjan Balachandran and R. Inkulu, editors, Algorithms and Discrete Applied Mathematics - 8superscriptth International Conference, CALDAM, Puducherry, India, February 10-12, Proceedings, volume 13179 of Lecture Notes in Computer Science, pages 167-180. Springer, 2022. Google Scholar
  18. Laurent Bulteau, Vincent Froese, and Rolf Niedermeier. Tight hardness results for consensus problems on circular strings and time series. SIAM J. Discret. Math., 34(3):1854-1883, 2020. Google Scholar
  19. Ke Chen. On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications. SIAM Journal on Computing, 39(3):923-947, 2009. Google Scholar
  20. Siu-Wing Cheng and Haoqiang Huang. Curve simplification and clustering under fréchet distance. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 1414-1432. SIAM, 2023. Google Scholar
  21. Anne Driemel, Amer Krivosija, and Christian Sohler. Clustering time series under the Fréchet distance. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, pages 766-785, 2016. Google Scholar
  22. Dan Feldman and Michael Langberg. A unified framework for approximating and clustering data. In Lance Fortnow and Salil P. Vadhan, editors, Proceedings of the 43superscriptrd ACM Symposium on Theory of Computing, pages 569-578. ACM, 2011. Google Scholar
  23. Ville Hautamäki, Pekka Nykanen, and Pasi Franti. Time-series clustering by approximate prototypes. In 2008 19th International Conference on Pattern Recognition, pages 1-4, 2008. Google Scholar
  24. Piotr Indyk. Sublinear time algorithms for metric space problems. In Jeffrey Scott Vitter, Lawrence L. Larmore, and Frank Thomson Leighton, editors, Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing, May 1-4, 1999, Atlanta, Georgia, USA, pages 428-434. ACM, 1999. URL: https://doi.org/10.1145/301250.301366.
  25. Youngseon Jeong, Myong Kee Jeong, and Olufemi A. Omitaomu. Weighted dynamic time warping for time series classification. Pattern Recognit., 44(9):2231-2240, 2011. Google Scholar
  26. Rohit J. Kate. Using dynamic time warping distances as features for improved time series classification. Data Min. Knowl. Discov., 30(2):283-312, 2016. Google Scholar
  27. Amit Kumar, Yogish Sabharwal, and Sandeep Sen. A Simple Linear Time (1+ε)-Approximation Algorithm for k-Means Clustering in Any Dimensions. In 45th Symposium on Foundations of Computer Science (FOCS), 17-19 October, Rome, Italy, Proceedings, pages 454-462. IEEE Computer Society, 2004. Google Scholar
  28. Michael Langberg and Leonard J. Schulman. Universal ε-approximators for Integrals. In Proceedings of the 21superscriptst Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 598-607, 2010. Google Scholar
  29. Daniel Lemire. Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recognition, 42(9):2169-2180, 2009. Google Scholar
  30. François Petitjean, Germain Forestier, Geoffrey I. Webb, Ann E. Nicholson, Yanping Chen, and Eamonn J. Keogh. Dynamic time warping averaging of time series allows faster and more accurate classification. In Ravi Kumar, Hannu Toivonen, Jian Pei, Joshua Zhexue Huang, and Xindong Wu, editors, 2014 IEEE International Conference on Data Mining, ICDM 2014, Shenzhen, China, December 14-17, 2014, pages 470-479. IEEE Computer Society, 2014. Google Scholar
  31. François Petitjean, Germain Forestier, Geoffrey I. Webb, Ann E. Nicholson, Yanping Chen, and Eamonn J. Keogh. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowl. Inf. Syst., 47(1):1-26, 2016. Google Scholar
  32. François Petitjean, Alain Ketterlin, and Pierre Gançarski. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit., 44(3):678-693, 2011. Google Scholar
  33. Lawrence Rabiner and Jay Wilpon. Considerations in applying clustering techniques to speaker independent word recognition. In ICASSP '79. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, pages 578-581, 1979. Google Scholar
  34. Thanawin Rakthanmanon, Bilson J. L. Campana, Abdullah Mueen, Gustavo E. A. P. A. Batista, M. Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn J. Keogh. Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping. ACM Trans. Knowl. Discov. Data, 7(3):10:1-10:31, 2013. Google Scholar
  35. Norbert Sauer. On the density of families of sets. Journal of Combinatorial Theory Series A, 13:145-147, 1972. Google Scholar
  36. Nathan Schaar, Vincent Froese, and Rolf Niedermeier. Faster binary mean computation under dynamic time warping. In 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, June 17-19, 2020, Copenhagen, Denmark, pages 28:1-28:13, 2020. Google Scholar
  37. Saharon Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages. Pacific Journal of Mathematics, 41(1), 1972. Google Scholar
  38. Tuan Minh Tran, Xuan-May Thi Le, Hien T. Nguyen, and Van-Nam Huynh. A novel non-parametric method for time series classification based on k-nearest neighbors and dynamic time warping barycenter averaging. Eng. Appl. Artif. Intell., 78:173-185, 2019. Google Scholar
  39. Vladimir Vapnik and Alexey Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16:264-280, 1971. Google Scholar