,
Lorenzo Taschin
Creative Commons Attribution 4.0 International license
Given oracle access to a real-valued function on the n-dimensional Boolean cube, how many queries does it take to estimate the squared Euclidean distance to its closest linear function within ε? Our main result is that O(log³(1/ε) ⋅ 1/ε²) queries suffice. Not only is the query complexity independent of n but it is optimal up to the polylogarithmic factor. Our estimator evaluates f on pairs correlated by noise rates chosen to cancel out the low-degree contributions to f while leaving the linear part intact. The query complexity is optimized when the noise rates are multiples of Chebyshev nodes. In contrast, we show that the dependence on n is unavoidable in two closely related settings. For estimation from random samples, Θ(√n/ε + 1/ε²) samples are necessary and sufficient. For agnostically learning a linear approximation with ε mean-square regret under the uniform distribution, Ω(n/√ε) nonadaptively chosen queries are necessary, while O(n/ε) random samples are known to be sufficient (Linial, Mansour, and Nisan). Our upper bounds apply to functions with bounded 4-norm. Our lower bounds apply even to ± 1-valued functions.
@InProceedings{bogdanov_et_al:LIPIcs.ITCS.2025.20,
author = {Bogdanov, Andrej and Taschin, Lorenzo},
title = {{Estimating Euclidean Distance to Linearity}},
booktitle = {16th Innovations in Theoretical Computer Science Conference (ITCS 2025)},
pages = {20:1--20:18},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {978-3-95977-361-4},
ISSN = {1868-8969},
year = {2025},
volume = {325},
editor = {Meka, Raghu},
publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.20},
URN = {urn:nbn:de:0030-drops-226481},
doi = {10.4230/LIPIcs.ITCS.2025.20},
annote = {Keywords: sublinear-time algorithms, statistical estimation, analysis of boolean functions, property testing, regression}
}