Estimating Euclidean Distance to Linearity

Authors Andrej Bogdanov , Lorenzo Taschin



PDF
Thumbnail PDF

File

LIPIcs.ITCS.2025.20.pdf
  • Filesize: 0.76 MB
  • 18 pages

Document Identifiers

Author Details

Andrej Bogdanov
  • University of Ottawa, Canada
Lorenzo Taschin
  • EPFL, Lausanne, Switzerland

Acknowledgements

We thank Gautam Prakriya for insightful discussions and the anonymous ITCS reviewers for corrections and helpful comments.

Cite As Get BibTex

Andrej Bogdanov and Lorenzo Taschin. Estimating Euclidean Distance to Linearity. In 16th Innovations in Theoretical Computer Science Conference (ITCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 325, pp. 20:1-20:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/LIPIcs.ITCS.2025.20

Abstract

Given oracle access to a real-valued function on the n-dimensional Boolean cube, how many queries does it take to estimate the squared Euclidean distance to its closest linear function within ε? Our main result is that O(log³(1/ε) ⋅ 1/ε²) queries suffice. Not only is the query complexity independent of n but it is optimal up to the polylogarithmic factor.
Our estimator evaluates f on pairs correlated by noise rates chosen to cancel out the low-degree contributions to f while leaving the linear part intact. The query complexity is optimized when the noise rates are multiples of Chebyshev nodes.
In contrast, we show that the dependence on n is unavoidable in two closely related settings. For estimation from random samples, Θ(√n/ε + 1/ε²) samples are necessary and sufficient. For agnostically learning a linear approximation with ε mean-square regret under the uniform distribution, Ω(n/√ε) nonadaptively chosen queries are necessary, while O(n/ε) random samples are known to be sufficient (Linial, Mansour, and Nisan). 
Our upper bounds apply to functions with bounded 4-norm. Our lower bounds apply even to ± 1-valued functions.

Subject Classification

ACM Subject Classification
  • Theory of computation → Streaming, sublinear and near linear time algorithms
  • Theory of computation → Randomness, geometry and discrete structures
  • Theory of computation → Probabilistic computation
Keywords
  • sublinear-time algorithms
  • statistical estimation
  • analysis of boolean functions
  • property testing
  • regression

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mitali Bafna, Srikanth Srinivasan, and Madhu Sudan. Local decoding and testing of polynomials over grids. Random Struct. Algorithms, 57(3):658-694, 2020. URL: https://doi.org/10.1002/RSA.20933.
  2. Shai Ben-David and Ruth Urner. The sample complexity of agnostic learning under deterministic labels. In Proceedings of The 27th Conference on Learning Theory, volume 35, pages 527-542, 13-15 June 2014. URL: https://proceedings.mlr.press/v35/ben-david14.html.
  3. Andrej Bogdanov and Gautam Prakriya. Direct Sum and Partitionability Testing over General Groups. In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), volume 198, pages 33:1-33:19, 2021. URL: https://doi.org/10.4230/LIPIcs.ICALP.2021.33.
  4. Luc Devroye, Abbas Mehrabian, and Tommy Reddad. The total variation distance between high-dimensional gaussians with the same mean. arXiv preprint, 2018. URL: https://arxiv.org/abs/1810.08693.
  5. Noah Fleming and Yuichi Yoshida. Distribution-free testing of linear functions on ℝⁿ. arXiv preprint, 2019. URL: https://arxiv.org/abs/1909.03391.
  6. Shirley Halevy and Eyal Kushilevitz. Distribution-free property-testing. SIAM J. Comput., 37(4):1107-1138, 2007. URL: https://doi.org/10.1137/050645804.
  7. Subhash Khot and Dana Moshkovitz. NP-hardness of approximately solving linear equations over reals. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 413-420, 2011. URL: https://doi.org/10.1145/1993636.1993692.
  8. Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, fourier transform, and learnability. Journal of the ACM (JACM), 40(3):607-620, 1993. URL: https://doi.org/10.1145/174130.174138.
  9. Ryan O'Donnell. Analysis of boolean functions. Cambridge University Press, 2014. Google Scholar
  10. Michal Parnas, Dana Ron, and Ronitt Rubinfeld. Tolerant property testing and distance approximation. J. Comput. Syst. Sci., 72(6):1012-1042, 2006. URL: https://doi.org/10.1016/j.jcss.2006.03.002.
  11. Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, USA, 2014. Google Scholar
  12. Ohad Shamir. The sample complexity of learning linear predictors with the squared loss. J. Mach. Learn. Res., 16(1):3475-3486, January 2015. URL: https://doi.org/10.5555/2789272.2912110.
  13. Madhur Tulsiani. Lecture 5: Information and coding theory. Lecture Notes, Winter 2021. URL: https://home.ttic.edu/~madhurt/courses/infotheory2021/l5.pdf.
  14. Tomas Vaškevičius and Nikita Zhivotovskiy. Suboptimality of constrained least squares and improvements via non-linear predictors. Bernoulli, 29, February 2023. URL: https://doi.org/10.3150/22-BEJ1465.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail