Formalization of a Stochastic Approximation Theorem

Authors Koundinya Vajjha , Barry Trager, Avraham Shinnar, Vasily Pestun



PDF
Thumbnail PDF

File

LIPIcs.ITP.2022.31.pdf
  • Filesize: 0.88 MB
  • 18 pages

Document Identifiers

Author Details

Koundinya Vajjha
  • University of Pittsburgh, PA, US
Barry Trager
  • IBM Research, Yorktown Heights, NY, US
Avraham Shinnar
  • IBM Research, Yorktown Heights, NY, US
Vasily Pestun
  • IBM Research, Yorktown Heights, NY, US
  • Institut des Hautes Études Scientifiques, Bures sur Yvette, France

Cite As Get BibTex

Koundinya Vajjha, Barry Trager, Avraham Shinnar, and Vasily Pestun. Formalization of a Stochastic Approximation Theorem. In 13th International Conference on Interactive Theorem Proving (ITP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 237, pp. 31:1-31:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.ITP.2022.31

Abstract

Stochastic approximation algorithms are iterative procedures which are used to approximate a target value in an environment where the target is unknown and direct observations are corrupted by noise. These algorithms are useful, for instance, for root-finding and function minimization when the target function or model is not directly known. Originally introduced in a 1951 paper by Robbins and Monro, the field of Stochastic approximation has grown enormously and has come to influence application domains from adaptive signal processing to artificial intelligence. As an example, the Stochastic Gradient Descent algorithm which is ubiquitous in various subdomains of Machine Learning is based on stochastic approximation theory. In this paper, we give a formal proof (in the Coq proof assistant) of a general convergence theorem due to Aryeh Dvoretzky [Dvoretzky, 1956] (proven in 1956) which implies the convergence of important classical methods such as the Robbins-Monro and the Kiefer-Wolfowitz algorithms. In the process, we build a comprehensive Coq library of measure-theoretic probability theory and stochastic processes.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Stochastic processes
  • Mathematics of computing → Nonlinear equations
  • Computing methodologies → Optimization algorithms
Keywords
  • Formal Verification
  • Stochastic Approximation
  • Stochastic Processes
  • Probability Theory
  • Optimization Algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Niels Henrik Abel. Note sur le mémoire de Mr. L. Olivier No. 4. du second tome de ce Journal, ajant pour titre "remarques sur les series infinies et leur convergence". Crelles Journal, 3, 1828. Google Scholar
  2. Reynald Affeldt and Cyril Cohen. Formalization of the Lebesgue measure in MathComp-Analysis. The Coq Workshop 2021, online, July 2, 2021, July 2021. Google Scholar
  3. Reynald Affeldt, Jacques Garrigue, and Takafumi Saikawa. Reasoning with conditional probabilities and joint distributions in coq. Computer Software, 37(3):79-95, 2020. URL: https://doi.org/10.11309/jssst.37.3_79.
  4. Reynald Affeldt and Manabu Hagiwara. Formalization of Shannon’s theorems in SSReflect-Coq. In 3rd Conference on Interactive Theorem Proving (http://itp2012.cs.princeton.edu/), Princeton, New Jersey, USA, August 13-15, 2012, volume 7406 of Lecture Notes in Computer Science, pages 233-249. Springer, August 2012.
  5. J Marshall Ash. Neither a worst convergent series nor a best divergent series exists. The College Mathematics Journal, 28(4):296-297, 1997. Google Scholar
  6. Robert B Ash, B Robert, Catherine A Doleans-Dade, and A Catherine. Probability and measure theory. Academic press, 2000. Google Scholar
  7. Philippe Audebaud and Christine Paulin-Mohring. Proofs of randomized algorithms in Coq. Science of Computer Programming, 74(8):568-589, 2009. Google Scholar
  8. Jeremy Avigad, Johannes Hölzl, and Luke Serafin. A formally verified proof of the central limit theorem. Journal of Automated Reasoning, 59(4):389-423, 2017. Google Scholar
  9. Alexander Bagnall and Gordon Stewart. Certifying the True Error: Machine Learning in Coq with Verified Generalization Guarantees. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, pages 2662-2669. AAAI Press, 2019. URL: https://doi.org/10.1609/aaai.v33i01.33012662.
  10. Alexander Bentkamp, Jasmin Christian Blanchette, and Dietrich Klakow. A Formal Proof of the Expressiveness of Deep Learning. J. Autom. Reason., 63(2):347-368, 2019. URL: https://doi.org/10.1007/s10817-018-9481-5.
  11. Errett Albert Bishop. Foundations of constructive analysis, 1967. Google Scholar
  12. Julius R. Blum. Approximation Methods which Converge with Probability one. The Annals of Mathematical Statistics, 25(2):382-386, 1954. URL: https://doi.org/10.1214/aoms/1177728794.
  13. Paul du Bois-Reymond. Eine neue Theorie der Convergenz und Divergenz von Reihen mit positiven Gliedern., 1873. Google Scholar
  14. Sylvie Boldo, François Clément, Florian Faissole, Vincent Martin, and Micaela Mayero. A Coq formal proof of the Lax-Milgram theorem. In 6th ACM SIGPLAN Conference on Certified Programs and Proofs, Paris, France, January 2017. URL: https://doi.org/10.1145/3018610.3018625.
  15. Sylvie Boldo, François Clément, Florian Faissole, Vincent Martin, and Micaela Mayero. A coq formalization of lebesgue integration of nonnegative functions. Journal of Automated Reasoning, pages 1-39, 2021. Google Scholar
  16. Sylvie Boldo, Catherine Lelay, and Guillaume Melquiond. Coquelicot: A user-friendly library of real analysis for Coq. Mathematics in Computer Science, 9, March 2014. URL: https://doi.org/10.1007/s11786-014-0181-1.
  17. Han-Fu Chen. Stochastic approximation and its applications, volume 64. Springer Science & Business Media, 2006. Google Scholar
  18. Thierry Coquand and Erik Palmgren. Metric boolean algebras and constructive measure theory. Archive for Mathematical Logic, 41(7):687-704, 2002. Google Scholar
  19. Thierry Coquand and Bas Spitters. Integrals and valuations. arXiv preprint arXiv:0808.1522, 2008. Google Scholar
  20. C Derman and J Sacks. On Dvoretzky’s stochastic approximation theorem. The Annals of Mathematical Statistics, 30(2):601-606, 1959. Google Scholar
  21. A. Dvoretzky. On stochastic approximation. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, 1956. Google Scholar
  22. Aryeh Dvoretzky. Stochastic approximation revisited. Advances in Applied Mathematics, 7(2):220-227, 1986. URL: https://doi.org/10.1016/0196-8858(86)90033-3.
  23. Herman Geuvers and Milad Niqui. Constructive reals in coq: Axioms and categoricity. In International Workshop on Types for Proofs and Programs, pages 79-95. Springer, 2000. Google Scholar
  24. Johannes Hoelzl. Markov processes in Isabelle/HOL. In Proceedings of the 6th ACM SIGPLAN Conference on Certified Programs and Proofs, CPP 2017, pages 100-111, New York, NY, USA, 2017. Association for Computing Machinery. URL: https://doi.org/10.1145/3018610.3018628.
  25. Tommi Jaakkola, Michael I Jordan, and Satinder P Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural computation, 6(6):1185-1201, 1994. Google Scholar
  26. J. Kiefer and J. Wolfowitz. Stochastic Estimation of the Maximum of a Regression Function. The Annals of Mathematical Statistics, 23(3):462-466, 1952. URL: https://doi.org/10.1214/aoms/1177729392.
  27. Tze Leung Lai. Stochastic Approximation. The Annals of Statistics, 31(2):391-406, 2003. URL: http://www.jstor.org/stable/3448398.
  28. Michel Loève. On almost sure convergence. Proc. Berkeley Sympos. math. Statist. Probability, California July 31 - August 12, 1950, 279-303 (1951)., 1951. Google Scholar
  29. H. Robbins and D. Siegmund. A convergence theorem for non negative almost supermartingales and some applications. In Jagdish S. Rustagi, editor, Optimizing Methods in Statistics, pages 233-257. Academic Press, 1971. URL: https://doi.org/10.1016/B978-0-12-604550-5.50015-8.
  30. Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400-407, 1951. Google Scholar
  31. Walter Rudin et al. Principles of mathematical analysis, volume 3. McGraw-hill New York, 1976. Google Scholar
  32. Daniel Selsam, Percy Liang, and David L. Dill. Developing Bug-Free Machine Learning Systems With Formal Mathematics. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3047-3056. PMLR, 2017. URL: http://proceedings.mlr.press/v70/selsam17a.html.
  33. Joseph Tassarotti and Robert Harper. A separation logic for concurrent randomized programs. Proceedings of the ACM on Programming Languages, 3(POPL):1-30, 2019. Google Scholar
  34. Joseph Tassarotti, Jean-Baptiste Tristan, and Koundinya Vajjha. A Formal Proof of PAC Learnability for Decision Stumps. CoRR, abs/1911.00385, 2019. URL: http://arxiv.org/abs/1911.00385.
  35. Joseph Tassarotti, Koundinya Vajjha, Anindya Banerjee, and Jean-Baptiste Tristan. A formal proof of PAC learnability for decision stumps. In Proceedings of the 10th ACM SIGPLAN International Conference on Certified Programs and Proofs, pages 5-17, 2021. Google Scholar
  36. The mathlib Community. The Lean mathematical library. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs (CPP 2020), pages 367-381, 2020. Google Scholar
  37. Jean-Baptiste Tristan, Joseph Tassarotti, Koundinya Vajjha, Michael L. Wick, and Anindya Banerjee. Verification of ML Systems via Reparameterization. CoRR, abs/2007.06776, 2020. URL: http://arxiv.org/abs/2007.06776.
  38. John N Tsitsiklis. Asynchronous stochastic approximation and q-learning. Machine learning, 16(3):185-202, 1994. Google Scholar
  39. Koundinya Vajjha, Avraham Shinnar, Barry Trager, Vasily Pestun, and Nathan Fulton. Certrl: formalizing convergence proofs for value and policy iteration in coq. In Proceedings of the 10th ACM SIGPLAN International Conference on Certified Programs and Proofs, pages 18-31, 2021. Google Scholar
  40. J. H. Venter. On Dvoretzky Stochastic Approximation Theorems. The Annals of Mathematical Statistics, 37(6):1534-1544, 1966. URL: https://doi.org/10.1214/aoms/1177699145.
  41. Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279-292, 1992. Google Scholar
  42. J. Wolfowitz. On the Stochastic Approximation Method of Robbins and Monro. The Annals of Mathematical Statistics, 23(3):457-461, 1952. URL: http://www.jstor.org/stable/2236689.
  43. J. Wolfowitz. On Stochastic Approximation Methods. The Annals of Mathematical Statistics, 27(4):1151-1156, 1956. URL: https://doi.org/10.1214/aoms/1177728082.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail