Recovery from Non-Decomposable Distance Oracles

Hu, Zhuangfei; Li, Xinda; Woodruff, David P.; Zhang, Hongyang; Zhang, Shufan

doi:10.4230/LIPIcs.ITCS.2023.73

Abstract

A line of work has looked at the problem of recovering an input from distance queries. In this setting, there is an unknown sequence s ∈ {0,1}^{≤ n}, and one chooses a set of queries y ∈ {0,1}^𝒪(n) and receives d(s,y) for a distance function d. The goal is to make as few queries as possible to recover s. Although this problem is well-studied for decomposable distances, i.e., distances of the form d(s,y) = ∑_{i=1}^n f(s_i, y_i) for some function f, which includes the important cases of Hamming distance, 𝓁_p-norms, and M-estimators, to the best of our knowledge this problem has not been studied for non-decomposable distances, for which there are important special cases such as edit distance, dynamic time warping (DTW), Fréchet distance, earth mover’s distance, and so on. We initiate the study and develop a general framework for such distances. Interestingly, for some distances such as DTW or Fréchet, exact recovery of the sequence s is provably impossible, and so we show by allowing the characters in y to be drawn from a slightly larger alphabet this then becomes possible. In a number of cases we obtain optimal or near-optimal query complexity. We also study the role of adaptivity for a number of different distance functions. One motivation for understanding non-adaptivity is that the query sequence can be fixed and the distances of the input to the queries provide a non-linear embedding of the input, which can be used in downstream applications involving, e.g., neural networks for natural language processing.

Amir Abboud, Arturs Backurs, and Virginia Vassilevska Williams. Tight hardness results for LCS and other sequence similarity measures. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 59-78. IEEE, 2015.
Peyman Afshani, Manindra Agrawal, Benjamin Doerr, Carola Doerr, Kasper Green Larsen, and Kurt Mehlhorn. The query complexity of a permutation-based variant of Mastermind. Discrete Applied Mathematics, 260:28-50, 2019.
Pankaj K Agarwal, Rinat Ben Avraham, Haim Kaplan, and Micha Sharir. Computing the discrete fréchet distance in subquadratic time. SIAM Journal on Computing, 43(2):429-449, 2014.
M Aldridge, O Johnson, and J Scarlett. Group testing: An information theory perspective. Foundations and Trends in Communications and Information Theory, 15(3-4):196-392, 2019.
A Andoni, M Deza, A Gupta, P Indyk, and S Raskhodnikova. Lower bounds for embedding edit distance into normed spaces. In Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pages 523-526, 2003.
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 377-386. IEEE, 2010.
Boris Aronov, Sariel Har-Peled, Christian Knauer, Yusu Wang, and Carola Wenk. Fréchet distance for curves, revisited. In European symposium on algorithms, pages 52-63. Springer, 2006.
Djamal Belazzougui and Qin Zhang. Edit distance: Sketching, streaming, and document exchange. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 51-60. IEEE, 2016.
Vladimir Braverman, Moses Charikar, William Kuszmaul, David P Woodruff, and Lin F Yang. The one-way communication complexity of dynamic time warping distance. arXiv preprint, 2019. URL: http://arxiv.org/abs/1903.03520.
Nader H Bshouty. Optimal algorithms for the coin weighing problem with a spring scale. In COLT, volume 2009, page 82, 2009.
Maike Buchin, Anne Driemel, Koen van Greevenbroek, Ioannis Psarros, and Dennis Rohde. Approximating length-restricted means under dynamic time warping. arXiv preprint, 2021. URL: http://arxiv.org/abs/2112.00408.
Xingyu Cai, Tingyang Xu, Jinfeng Yi, Junzhou Huang, and Sanguthevar Rajasekaran. DTWNet: a dynamic time warping network. Advances in neural information processing systems, 32, 2019.
David G Cantor and WH Mills. Determination of a subset from certain combinatorial properties. Canadian Journal of Mathematics, 18:42-48, 1966.
Diptarka Chakraborty, Elazar Goldenberg, and Michal Kouckỳ. Streaming algorithms for embedding and computing edit distance in the low distance regime. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 712-725, 2016.
Moses Charikar, Ofir Geri, Michael P Kim, and William Kuszmaul. On estimating edit distance: Alignment, dimension reduction, and embeddings. In ICALP, 2018.
Moses Charikar and Robert Krauthgamer. Embedding the Ulam metric into l₁. Theory of Computing, 2(1):207-224, 2006.
Lei Chen and Raymond Ng. On the marriage of l_p-norms and edit distance. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pages 792-803, 2004.
Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certified adversarial robustness via randomized smoothing. arXiv preprint, 2019. URL: http://arxiv.org/abs/1902.02918.
Amin Coja-Oghlan, Oliver Gebhard, Max Hahn-Klimroth, and Philipp Loick. Optimal group testing. In Conference on Learning Theory, pages 1374-1388. PMLR, 2020.
Graham Cormode. Sequence distance embeddings. PhD thesis, Department of Computer Science, 2003.
Robert Dorfman. The detection of defective members of large populations. The Annals of Mathematical Statistics, 14(4):436-440, 1943.
Manuel Fernández, David P Woodruff, and Taisuke Yasuda. The query complexity of mastermind with l_p distances. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2019.
Zhuangfei Hu, Xinda Li, David P Woodruff, Hongyang Zhang, and Shufan Zhang. Recovery from non-decomposable distance oracles. arXiv preprint, 2022. URL: http://arxiv.org/abs/2209.05676.
Zilin Jiang and Nikita Polyanskii. On the metric dimension of cartesian powers of a graph. Journal of Combinatorial Theory, Series A, 165:1-14, 2019.
Subhash Khot and Assaf Naor. Nonembeddability theorems via Fourier analysis. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05), pages 101-110. IEEE, 2005.
Donald E Knuth. The computer as Master Mind. Journal of Recreational Mathematics, 9(1):1-6, 1976.
Robert Krauthgamer and Yuval Rabani. Improved lower bounds for embeddings into l₁. SIAM Journal on Computing, 38(6):2487-2498, 2009.
Ilan Kremer, Noam Nisan, and Dana Ron. On randomized one-round communication complexity. In Frank Thomson Leighton and Allan Borodin, editors, Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, 29 May-1 June 1995, Las Vegas, Nevada, USA, pages 596-605. ACM, 1995. URL: https://doi.org/10.1145/225058.225277.
Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. In IEEE Symposium on Security and Privacy, pages 656-672, 2019.
Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707-710, 1966.
Ming Li and Paul MB Vitányi. Combinatorics and Kolmogorov complexity. In 1991 Proceedings of the Sixth Annual Structure in Complexity Theory Conference, pages 154-155. IEEE Computer Society, 1991.
Juan A Rodríguez-Velázquez, Ismael G Yero, Dorota Kuziak, and Ortrud R Oellermann. On the strong metric dimension of Cartesian and direct products of graphs. Discrete Mathematics, 335:8-19, 2014.
Nathan Schaar, Vincent Froese, and Rolf Niedermeier. Faster binary mean computation under dynamic time warping. arXiv preprint, 2020. URL: http://arxiv.org/abs/2002.01178.
Harold S Shapiro and NJ Fine. E1399. The American Mathematical Monthly, 67(7):697-698, 1960.
Staffan Söderberg and Harold S Shapiro. A combinatory detection problem. The American Mathematical Monthly, 70(10):1066-1070, 1963.
Roman Vershynin. Lectures in geometric functional analysis. Unpublished manuscript. Available at https://www.math.uci.edu/~rvershyn/papers/GFA-book.pdf, 3(3):3-3, 2011.
Chao Wang, Qing Zhao, and Chen-Nee Chuah. Optimal nested test plan for combinatorial quantitative group testing. IEEE Transactions on Signal Processing, 66(4):992-1006, 2017.

Recovery from Non-Decomposable Distance Oracles

Authors Zhuangfei Hu, Xinda Li, David P. Woodruff, Hongyang Zhang, Shufan Zhang

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Recovery from Non-Decomposable Distance Oracles

Authors Zhuangfei Hu, Xinda Li, David P. Woodruff, Hongyang Zhang, Shufan Zhang

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Related Versions

References

Thanks for your feedback!

Could not send message