Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation
Dynamic time warping distance (DTW) is a widely used distance measure between time series, with applications in areas such as speech recognition and bioinformatics. The best known algorithms for computing DTW run in near quadratic time, and conditional lower bounds prohibit the existence of significantly faster algorithms.
The lower bounds do not prevent a faster algorithm for the important special case in which the DTW is small, however. For an arbitrary metric space Sigma with distances normalized so that the smallest non-zero distance is one, we present an algorithm which computes dtw(x, y) for two strings x and y over Sigma in time O(n * dtw(x, y)). When dtw(x, y) is small, this represents a significant speedup over the standard quadratic-time algorithm.
Using our low-distance regime algorithm as a building block, we also present an approximation algorithm which computes dtw(x, y) within a factor of O(n^epsilon) in time O~(n^{2 - epsilon}) for 0 < epsilon < 1. The algorithm allows for the strings x and y to be taken over an arbitrary well-separated tree metric with logarithmic depth and at most exponential aspect ratio. Notably, any polynomial-size metric space can be efficiently embedded into such a tree metric with logarithmic expected distortion. Extending our techniques further, we also obtain the first approximation algorithm for edit distance to work with characters taken from an arbitrary metric space, providing an n^epsilon-approximation in time O~(n^{2 - epsilon}), with high probability.
Finally, we turn our attention to the relationship between edit distance and dynamic time warping distance. We prove a reduction from computing edit distance over an arbitrary metric space to computing DTW over the same metric space, except with an added null character (whose distance to a letter l is defined to be the edit-distance insertion cost of l). Applying our reduction to a conditional lower bound of Bringmann and Künnemann pertaining to edit distance over {0, 1}, we obtain a conditional lower bound for computing DTW over a three letter alphabet (with distances of zero and one). This improves on a previous result of Abboud, Backurs, and Williams, who gave a conditional lower bound for DTW over an alphabet of size five.
With a similar approach, we also prove a reduction from computing edit distance (over generalized Hamming Space) to computing longest-common-subsequence length (LCS) over an alphabet with an added null character. Surprisingly, this means that one can recover conditional lower bounds for LCS directly from those for edit distance, which was not previously thought to be the case.
dynamic time warping
edit distance
approximation algorithm
tree metrics
Theory of computation
Theory of computation~Design and analysis of algorithms
80:1-80:15
Track A: Algorithms, Complexity and Games
https://arxiv.org/abs/1904.09690
The author would like to thank Moses Charikar for his mentoring and advice throughout the project, Ofir Geri for his support and for many useful conversations, and Virginia Williams for suggesting the problem of reducing between edit distance and LCS.
William
Kuszmaul
William Kuszmaul
Massachusetts Institute of Technology, Cambridge, USA
Supported by an MIT Akamai Fellowship and a Fannie & John Hertz Foundation Fellowship. Also supported by NSF Grants 1314547 and 1533644. Parts of this research were performed during the Stanford CURIS research program.
10.4230/LIPIcs.ICALP.2019.80
John Aach and George M Church. Aligning gene expression time series with time warping algorithms. Bioinformatics, 17(6):495-508, 2001.
Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Quadratic-time hardness of LCS and other sequence similarity measures. arXiv preprint, 2015. URL: http://arxiv.org/abs/1501.07053.
http://arxiv.org/abs/1501.07053
Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In 56th Annual Symposium on Foundations of Computer Science (FOCS), pages 59-78, 2015.
Pankaj K. Agarwal, Kyle Fox, Jiangwei Pan, and Rex Ying. Approximating Dynamic Time Warping and Edit Distance for a Pair of Point Sequences. In 32nd International Symposium on Computational Geometry (SoCG), pages 6:1-6:16, 2016.
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proceedings of the 51st Annual Symposium on Foundations of Computer Science (FOCS), pages 377-386, 2010.
Alexandr Andoni and Krzysztof Onak. Approximating Edit Distance in Near-Linear Time. SIAM J. Comput., 41(6):1635-1648, 2012.
Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the 47th Annual Symposium on Theory of Computing (STOC), pages 51-58, 2015.
Nikhil Bansal, Niv Buchbinder, Aleksander Madry, and Joseph Naor. A polylogarithmic-competitive algorithm for the k-server problem. In 52nd Annual Symposium on Foundations of Computer Science (FOCS), pages 267-276, 2011.
Ziv Bar-Yossef, T.S. Jayram, Robert Krauthgamer, and Ravi Kumar. Approximating edit distance efficiently. In Proceedings of 45th Annual Symposium on Foundations of Computer Science (FOCS), pages 550-559, 2004.
Tugkan Batu, Funda Ergün, and Süleyman Cenk Sahinalp. Oblivious string embeddings and edit distance approximations. In Proceedings of the 17th Annual Symposium on Discrete Algorithms (SODA), pages 792-801, 2006.
Nurjahan Begum, Liudmila Ulanova, Jun Wang, and Eamonn J. Keogh. Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 49-58, 2015.
Vladimir Braverman, Moses Charikar, William Kuszmaul, David Woodruff, and Lin Yang. The One-Way Communication Complexity of Dynamic Time Warping Distance. Manuscript submitted for publication, 2018.
Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In 56th Annual Symposium on Foundations of Computer Science (FOCS), pages 79-97, 2015.
Karl Bringmann and Wolfgang Mulzer. Approximability of the discrete Fréchet distance. Journal of Computational Geometry, 7(2):46-76, 2015.
EG Caiani, A Porta, G Baselli, M Turiel, S Muzzupappa, F Pieruzzi, C Crema, A Malliani, and S Cerutti. Warped-average template technique to track on a cycle-by-cycle basis the cardiac filling phases on left ventricular volume. In Computers in Cardiology 1998, pages 73-76, 1998.
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Kouckỳ, and Michael Saks. Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time. In Proceedings of the 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 979-990, 2018.
Diptarka Chakraborty, Elazar Goldenberg, and Michal Kouckỳ. Streaming algorithms for computing edit distance without exploiting suffix trees. arXiv preprint, 2016. URL: http://arxiv.org/abs/1607.03718.
http://arxiv.org/abs/1607.03718
Moses Charikar, Ofir Geri, Michael P. Kim, and William Kuszmaul. On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings. In 45th International Colloquium on Automata, Languages, and Programming (ICALP), pages 34:1-34:14, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.ICALP.2018.34.
http://dx.doi.org/10.4230/LIPIcs.ICALP.2018.34
Alexander De Luca, Alina Hang, Frederik Brudy, Christian Lindner, and Heinrich Hussmann. Touch me once and i know it’s you!: implicit authentication based on touch screen patterns. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 987-996, 2012.
Anne Driemel and Francesco Silvestri. Locality-Sensitive Hashing of Curves. In 33rd International Symposium on Computational Geometry (SoCG), pages 37:1-37:16, 2017.
Ioannis Z. Emiris and Ioannis Psarros. Products of Euclidean Metrics and Applications to Proximity Questions among Curves. In 34th International Symposium on Computational Geometry, SoCG 2018, June 11-14, 2018, Budapest, Hungary, pages 37:1-37:13, 2018.
Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating arbitrary metrics by tree metrics. Journal of Computer and System Sciences, 69(3):485-497, 2004.
Zvi Galil and Kunsoo Park. An improved algorithm for approximate string matching. SIAM Journal on Computing, 19(6):989-999, 1990.
Toni Giorgino et al. Computing and visualizing dynamic time warping alignments in R: the DTW package. Journal of statistical Software, 31(7):1-24, 2009.
Omer Gold and Micha Sharir. Dynamic Time Warping and Geometric Edit Distance: Breaking the Quadratic Barrier. In 44th International Colloquium on Automata, Languages, and Programming, (ICALP), pages 25:1-25:14, 2017.
Tao Jiang, Guohui Lin, Bin Ma, and Kaizhong Zhang. A general edit distance between RNA structures. Journal of computational biology, 9(2):371-388, 2002.
Eamonn J. Keogh. Exact Indexing of Dynamic Time Warping. In 28th International Conference on Very Large Data Bases (VLDB), pages 406-417, 2002.
Eamonn J. Keogh and Michael J. Pazzani. Scaling up Dynamic Time Warping to Massive Dataset. In Principles of Data Mining and Knowledge Discovery, Third European Conference, (PKDD), pages 1-11, 1999.
Eamonn J. Keogh and Michael J. Pazzani. Scaling up dynamic time warping for datamining applications. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 285-289, 2000.
Jon Kleinberg and Eva Tardos. Algorithm design. Pearson, 2006.
William Kuszmaul. Dynamic Time Warping in Strongly Subquadratic time: Algorithms for the Low-Distance Regime and Approximate Evaluation. arXiv preprint, 2019. URL: http://arxiv.org/abs/1904.09690.
http://arxiv.org/abs/1904.09690
William Kuszmaul. Efficiently Approximating Edit Distance Between Pseudorandom Strings. In Proceedings of the thirtieth annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 2019.
Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM Journal on Computing, 27(2):557-582, 1998.
Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint, 2010. URL: http://arxiv.org/abs/1003.4083.
http://arxiv.org/abs/1003.4083
Mario E Munich and Pietro Perona. Continuous dynamic time warping for translation-invariant curve alignment with applications to signature verification. In Proceedings of 7th International Conference on Computer Vision, volume 1, pages 108-115, 1999.
Gonzalo Navarro. A guided tour to approximate string matching. ACM computing surveys (CSUR), 33(1):31-88, 2001.
Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 48(3):443-453, 1970.
François Petitjean, Germain Forestier, Geoffrey I. Webb, Ann E. Nicholson, Yanping Chen, and Eamonn J. Keogh. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowl. Inf. Syst., 47(1):1-26, 2016.
Hiroaki Sakoe and Seibi Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing, 26(1):43-49, 1978.
Esko Ukkonen. Algorithms for approximate string matching. Information and control, 64(1-3):100-118, 1985.
Taras K. Vintsyuk. Speech discrimination by dynamic programming. Cybernetics, 4(1):52-57, 1968.
Robert A. Wagner and Michael J. Fischer. The String-to-String Correction Problem. J. ACM, 21(1):168-173, 1974.
Rex Ying, Jiangwei Pan, Kyle Fox, and Pankaj K Agarwal. A simple efficient approximation algorithm for dynamic time warping. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, page 21, 2016.
Yunyue Zhu and Dennis Shasha. Warping indexes with envelope transforms for query by humming. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 181-192, 2003.
William Kuszmaul
Creative Commons Attribution 3.0 Unported license
https://creativecommons.org/licenses/by/3.0/legalcode