2 Search Results for "Thomas, Philip S."

Document

DOI: 10.4230/LIPIcs.ITP.2022.32

Mechanizing Soundness of Off-Policy Evaluation

Authors: Jared Yeager, J. Eliot B. Moss, Michael Norrish, and Philip S. Thomas

Published in: LIPIcs, Volume 237, 13th International Conference on Interactive Theorem Proving (ITP 2022)

Abstract

There are reinforcement learning scenarios - e.g., in medicine - where we are compelled to be as confident as possible that a policy change will result in an improvement before implementing it. In such scenarios, we can employ off-policy evaluation (OPE). The basic idea of OPE is to record histories of behaviors under the current policy, and then develop an estimate of the quality of a proposed new policy, seeing what the behavior would have been under the new policy. As we are evaluating the policy without actually using it, we have the "off-policy" of OPE. Applying a concentration inequality to the estimate, we derive a confidence interval for the expected quality of the new policy. If the confidence interval lies above that of the current policy, we can change policies with high confidence that we will do no harm. We focus here on the mathematics of this method, by mechanizing the soundness of off-policy evaluation. A natural side effect of the mechanization is both to clarify all the result’s mathematical assumptions and preconditions, and to further develop HOL4’s library of verified statistical mathematics, including concentration inequalities. Of more significance, the OPE method relies on importance sampling, whose soundness we prove using a measure-theoretic approach. In fact, we generalize the standard result, showing it for contexts comprising both discrete and continuous probability distributions.

Cite as

Jared Yeager, J. Eliot B. Moss, Michael Norrish, and Philip S. Thomas. Mechanizing Soundness of Off-Policy Evaluation. In 13th International Conference on Interactive Theorem Proving (ITP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 237, pp. 32:1-32:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{yeager_et_al:LIPIcs.ITP.2022.32,
  author =	{Yeager, Jared and Moss, J. Eliot B. and Norrish, Michael and Thomas, Philip S.},
  title =	{{Mechanizing Soundness of Off-Policy Evaluation}},
  booktitle =	{13th International Conference on Interactive Theorem Proving (ITP 2022)},
  pages =	{32:1--32:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-252-5},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{237},
  editor =	{Andronick, June and de Moura, Leonardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.ITP.2022.32},
  URN =		{urn:nbn:de:0030-drops-167413},
  doi =		{10.4230/LIPIcs.ITP.2022.32},
  annote =	{Keywords: Formal Methods, HOL4, Reinforcement Learning, Off-Policy Evaluation, Concentration Inequality, Hoeffding}
}

Document

DOI: 10.4230/LIPIcs.SNAPL.2015.309

A Complement to Blame

Authors: Philip Wadler

Published in: LIPIcs, Volume 32, 1st Summit on Advances in Programming Languages (SNAPL 2015)

Abstract

Contracts, gradual typing, and hybrid typing all permit less-precisely typed and more-precisely typed code to interact. Blame calculus encompasses these, and guarantees blame safety: blame for type errors always lays with less-precisely typed code. This paper serves as a complement to the literature on blame calculus: it elaborates on motivation, comments on the reception of the work, critiques some work for not properly attending to blame, and looks forward to applications. No knowledge of contracts, gradual typing, hybrid typing, or blame calculus is assumed.

Cite as

Philip Wadler. A Complement to Blame. In 1st Summit on Advances in Programming Languages (SNAPL 2015). Leibniz International Proceedings in Informatics (LIPIcs), Volume 32, pp. 309-320, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2015)

Copy BibTex To Clipboard

@InProceedings{wadler:LIPIcs.SNAPL.2015.309,
  author =	{Wadler, Philip},
  title =	{{A Complement to Blame}},
  booktitle =	{1st Summit on Advances in Programming Languages (SNAPL 2015)},
  pages =	{309--320},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-939897-80-4},
  ISSN =	{1868-8969},
  year =	{2015},
  volume =	{32},
  editor =	{Ball, Thomas and Bodík, Rastislav and Krishnamurthi, Shriram and Lerner, Benjamin S. and Morriset, Greg},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.SNAPL.2015.309},
  URN =		{urn:nbn:de:0030-drops-50333},
  doi =		{10.4230/LIPIcs.SNAPL.2015.309},
  annote =	{Keywords: contracts, gradual typing, hybrid typing, blame calculus}
}

Refine by Author
1 Moss, J. Eliot B.
1 Norrish, Michael
1 Thomas, Philip S.
1 Wadler, Philip
1 Yeager, Jared

Refine by Classification
1 Mathematics of computing → Hypothesis testing and confidence interval computation
1 Theory of computation → Interactive proof systems
1 Theory of computation → Logic and verification
1 Theory of computation → Sequential decision making

Refine by Keyword
1 Concentration Inequality
1 Formal Methods
1 HOL4
1 Hoeffding
1 Off-Policy Evaluation
Show More...

Refine by Type
2 document

Refine by Publication Year
1 2015
1 2022

2 Search Results for "Thomas, Philip S."

Mechanizing Soundness of Off-Policy Evaluation

Abstract

Cite as

A Complement to Blame

Abstract

Cite as

Thanks for your feedback!

Could not send message