Outlier Robust Multivariate Polynomial Regression

Authors Vipul Arora , Arnab Bhattacharyya , Mathews Boban , Venkatesan Guruswami , Esty Kelman



PDF
Thumbnail PDF

File

LIPIcs.ESA.2024.12.pdf
  • Filesize: 1.04 MB
  • 17 pages

Document Identifiers

Author Details

Vipul Arora
  • School of Computing, National University of Singapore, Singapore
Arnab Bhattacharyya
  • School of Computing, National University of Singapore, Singapore
Mathews Boban
  • School of Computing, National University of Singapore, Singapore
Venkatesan Guruswami
  • Department of EECS, and Department of Mathematics, University of California, Berkeley, CA, USA
Esty Kelman
  • CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
  • Department of Computer Science, and Faculty of Computing & Data Sciences, Boston University, MA, USA

Acknowledgements

The authors would like to thank Yuval Filmus for fruitful discussions about some aspects of the robust regression problem.

Cite AsGet BibTex

Vipul Arora, Arnab Bhattacharyya, Mathews Boban, Venkatesan Guruswami, and Esty Kelman. Outlier Robust Multivariate Polynomial Regression. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 12:1-12:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.ESA.2024.12

Abstract

We study the problem of robust multivariate polynomial regression: let p: ℝⁿ → ℝ be an unknown n-variate polynomial of degree at most d in each variable. We are given as input a set of random samples (𝐱_i,y_i) ∈ [-1,1]ⁿ × ℝ that are noisy versions of (𝐱_i,p(𝐱_i)). More precisely, each 𝐱_i is sampled independently from some distribution χ on [-1,1]ⁿ, and for each i independently, y_i is arbitrary (i.e., an outlier) with probability at most ρ < 1/2, and otherwise satisfies |y_i-p(𝐱_i)| ≤ σ. The goal is to output a polynomial p̂, of degree at most d in each variable, within an 𝓁_∞-distance of at most O(σ) from p. Kane, Karmalkar, and Price [FOCS'17] solved this problem for n = 1. We generalize their results to the n-variate setting, showing an algorithm that achieves a sample complexity of O_n(dⁿlog d), where the hidden constant depends on n, if χ is the n-dimensional Chebyshev distribution. The sample complexity is O_n(d^{2n}log d), if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most O(σ), and the run-time depends on log(1/σ). In the setting where each 𝐱_i and y_i are known up to N bits of precision, the run-time’s dependence on N is linear. We also show that our sample complexities are optimal in terms of dⁿ. Furthermore, we show that it is possible to have the run-time be independent of 1/σ, at the cost of a higher sample complexity.

Subject Classification

ACM Subject Classification
  • Theory of computation → Continuous optimization
Keywords
  • Robust Statistics
  • Polynomial Regression
  • Sample Efficient Learning

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Sanjeev Arora and Subhash Khot. Fitting algebraic curves to noisy data. Journal of Computer and System Sciences, 67(2):325-340, 2003. Special Issue on STOC 2002. URL: https://doi.org/10.1016/S0022-0000(03)00012-6.
  2. Vipul Arora, Arnab Bhattacharyya, Mathews Boban, Venkatesan Guruswami, and Esty Kelman. Outlier Robust Multivariate Polynomial Regression, 2024. URL: https://arxiv.org/abs/2403.09465.
  3. Hadassa Daltrophe, Shlomi Dolev, and Zvi Lotker. Big data interpolation using functional representation. Acta Informatica, 55:213-225, 2018. URL: https://doi.org/10.1007/s00236-016-0288-8.
  4. Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart. Sever: A robust meta-algorithm for stochastic optimization. In International Conference on Machine Learning, pages 1596-1606. PMLR, 2019. URL: https://arxiv.org/abs/1803.02815.
  5. Ilias Diakonikolas, Weihao Kong, and Alistair Stewart. Efficient algorithms and lower bounds for robust linear regression. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2745-2754. SIAM, 2019. URL: https://arxiv.org/abs/1806.00040.
  6. V. Guruswami and D. Zuckerman. Robust Fourier and Polynomial Curve Fitting. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 751-759, Los Alamitos, CA, USA, October 2016. IEEE Computer Society. URL: https://doi.org/10.1109/FOCS.2016.75.
  7. Helmut. Norms on 𝒫_N Vector Space of Polynomials up to Order N. Mathematics Stack Exchange. URL: https://math.stackexchange.com/q/2693954.
  8. Daniel Kane, Sushrut Karmalkar, and Eric Price. Robust Polynomial Regression up to the Information Theoretic Limit. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 391-402, 2017. URL: https://doi.org/10.1109/FOCS.2017.43.
  9. Adam R. Klivans, Pravesh K. Kothari, and Raghu Meka. Efficient algorithms for outlier-robust regression. In Sébastien Bubeck, Vianney Perchet, and Philippe Rigollet, editors, Conference On Learning Theory, COLT 2018, Stockholm, Sweden, 6-9 July 2018, volume 75 of Proceedings of Machine Learning Research, pages 1420-1430. PMLR, 2018. URL: http://proceedings.mlr.press/v75/klivans18a.html.
  10. Andrey Andreyevich Markov. On a question by D. I. Mendeleev. Zap. Imp. Akad. Nauk. St. Petersburg, 62:1-24, 1890. URL: https://history-of-approximation-theory.com/fpapers/markov4.pdf.
  11. Paul G Nevai. Bernstein’s inequality in lp for 0< p< 1. Journal of Approximation Theory, 27(3):239-243, 1979. URL: https://doi.org/10.1016/0021-9045(79)90105-9.
  12. Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishnan, and Pradeep Ravikumar. Robust Estimation via Robust Gradient Estimation. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(3):601-627, 2020. URL: https://arxiv.org/abs/1802.06485.
  13. John Wolberg. Data analysis using the method of least squares: extracting the most information from experiments. Springer Science & Business Media, 2006. URL: https://doi.org/10.1007/3-540-31720-1.
  14. Achim Zielesny. From curve fitting to machine learning, volume 18. Springer, 2011. URL: https://doi.org/10.1007/978-3-319-32545-3.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail