Given oracle access to a real-valued function on the n-dimensional Boolean cube, how many queries does it take to estimate the squared Euclidean distance to its closest linear function within ε? Our main result is that O(log³(1/ε) ⋅ 1/ε²) queries suffice. Not only is the query complexity independent of n but it is optimal up to the polylogarithmic factor. Our estimator evaluates f on pairs correlated by noise rates chosen to cancel out the low-degree contributions to f while leaving the linear part intact. The query complexity is optimized when the noise rates are multiples of Chebyshev nodes. In contrast, we show that the dependence on n is unavoidable in two closely related settings. For estimation from random samples, Θ(√n/ε + 1/ε²) samples are necessary and sufficient. For agnostically learning a linear approximation with ε mean-square regret under the uniform distribution, Ω(n/√ε) nonadaptively chosen queries are necessary, while O(n/ε) random samples are known to be sufficient (Linial, Mansour, and Nisan). Our upper bounds apply to functions with bounded 4-norm. Our lower bounds apply even to ± 1-valued functions.
@InProceedings{bogdanov_et_al:LIPIcs.ITCS.2025.20, author = {Bogdanov, Andrej and Taschin, Lorenzo}, title = {{Estimating Euclidean Distance to Linearity}}, booktitle = {16th Innovations in Theoretical Computer Science Conference (ITCS 2025)}, pages = {20:1--20:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-361-4}, ISSN = {1868-8969}, year = {2025}, volume = {325}, editor = {Meka, Raghu}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ITCS.2025.20}, URN = {urn:nbn:de:0030-drops-226481}, doi = {10.4230/LIPIcs.ITCS.2025.20}, annote = {Keywords: sublinear-time algorithms, statistical estimation, analysis of boolean functions, property testing, regression} }
Feedback for Dagstuhl Publishing