DROPS

Document

DOI: 10.4230/LIPIcs.CONCUR.2022.11

Anytime Guarantees for Reachability in Uncountable Markov Decision Processes

Authors: Kush Grover, Jan Křetínský, Tobias Meggendorfer, and Maximilian Weininger

Published in: LIPIcs, Volume 243, 33rd International Conference on Concurrency Theory (CONCUR 2022)

Abstract

We consider the problem of approximating the reachability probabilities in Markov decision processes (MDP) with uncountable (continuous) state and action spaces. While there are algorithms that, for special classes of such MDP, provide a sequence of approximations converging to the true value in the limit, our aim is to obtain an algorithm with guarantees on the precision of the approximation. As this problem is undecidable in general, assumptions on the MDP are necessary. Our main contribution is to identify sufficient assumptions that are as weak as possible, thus approaching the "boundary" of which systems can be correctly and reliably analyzed. To this end, we also argue why each of our assumptions is necessary for algorithms based on processing finitely many observations. We present two solution variants. The first one provides converging lower bounds under weaker assumptions than typical ones from previous works concerned with guarantees. The second one then utilizes stronger assumptions to additionally provide converging upper bounds. Altogether, we obtain an anytime algorithm, i.e. yielding a sequence of approximants with known and iteratively improving precision, converging to the true value in the limit. Besides, due to the generality of our assumptions, our algorithms are very general templates, readily allowing for various heuristics from literature in contrast to, e.g., a specific discretization algorithm. Our theoretical contribution thus paves the way for future practical improvements without sacrificing correctness guarantees.

Cite as

Kush Grover, Jan Křetínský, Tobias Meggendorfer, and Maximilian Weininger. Anytime Guarantees for Reachability in Uncountable Markov Decision Processes. In 33rd International Conference on Concurrency Theory (CONCUR 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 243, pp. 11:1-11:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{grover_et_al:LIPIcs.CONCUR.2022.11,
  author =	{Grover, Kush and K\v{r}et{\'\i}nsk\'{y}, Jan and Meggendorfer, Tobias and Weininger, Maximilian},
  title =	{{Anytime Guarantees for Reachability in Uncountable Markov Decision Processes}},
  booktitle =	{33rd International Conference on Concurrency Theory (CONCUR 2022)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-246-4},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{243},
  editor =	{Klin, Bartek and Lasota, S{\l}awomir and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2022.11},
  URN =		{urn:nbn:de:0030-drops-170743},
  doi =		{10.4230/LIPIcs.CONCUR.2022.11},
  annote =	{Keywords: Uncountable system, Markov decision process, discrete-time Markov control process, probabilistic verification, anytime guarantee}
}

@InProceedings{grover_et_al:LIPIcs.CONCUR.2022.11,
  author =	{Grover, Kush and K\v{r}et{\'\i}nsk\'{y}, Jan and Meggendorfer, Tobias and Weininger, Maximilian},
  title =	{{Anytime Guarantees for Reachability in Uncountable Markov Decision Processes}},
  booktitle =	{33rd International Conference on Concurrency Theory (CONCUR 2022)},
  pages =	{11:1--11:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-246-4},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{243},
  editor =	{Klin, Bartek and Lasota, S{\l}awomir and Muscholl, Anca},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2022.11},
  URN =		{urn:nbn:de:0030-drops-170743},
  doi =		{10.4230/LIPIcs.CONCUR.2022.11},
  annote =	{Keywords: Uncountable system, Markov decision process, discrete-time Markov control process, probabilistic verification, anytime guarantee}
}

Document

DOI: 10.4230/LIPIcs.CONCUR.2021.5

Enforcing ω-Regular Properties in Markov Chains by Restarting

Authors: Javier Esparza, Stefan Kiefer, Jan Křetínský, and Maximilian Weininger

Published in: LIPIcs, Volume 203, 32nd International Conference on Concurrency Theory (CONCUR 2021)

Abstract

Restarts are used in many computer systems to improve performance. Examples include reloading a webpage, reissuing a request, or restarting a randomized search. The design of restart strategies has been extensively studied by the performance evaluation community. In this paper, we address the problem of designing universal restart strategies, valid for arbitrary finite-state Markov chains, that enforce a given ω-regular property while not knowing the chain. A strategy enforces a property φ if, with probability 1, the number of restarts is finite, and the run of the Markov chain after the last restart satisfies φ. We design a simple "cautious" strategy that solves the problem, and a more sophisticated "bold" strategy with an almost optimal number of restarts.

Cite as

Javier Esparza, Stefan Kiefer, Jan Křetínský, and Maximilian Weininger. Enforcing ω-Regular Properties in Markov Chains by Restarting. In 32nd International Conference on Concurrency Theory (CONCUR 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 203, pp. 5:1-5:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{esparza_et_al:LIPIcs.CONCUR.2021.5,
  author =	{Esparza, Javier and Kiefer, Stefan and K\v{r}et{\'\i}nsk\'{y}, Jan and Weininger, Maximilian},
  title =	{{Enforcing \omega-Regular Properties in Markov Chains by Restarting}},
  booktitle =	{32nd International Conference on Concurrency Theory (CONCUR 2021)},
  pages =	{5:1--5:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-203-7},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{203},
  editor =	{Haddad, Serge and Varacca, Daniele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2021.5},
  URN =		{urn:nbn:de:0030-drops-143824},
  doi =		{10.4230/LIPIcs.CONCUR.2021.5},
  annote =	{Keywords: Markov chains, omega-regular properties, runtime enforcement}
}

Document

DOI: 10.4230/LIPIcs.CONCUR.2019.5

Of Cores: A Partial-Exploration Framework for Markov Decision Processes

Authors: Jan Křetínský and Tobias Meggendorfer

Published in: LIPIcs, Volume 140, 30th International Conference on Concurrency Theory (CONCUR 2019)

Abstract

We introduce a framework for approximate analysis of Markov decision processes (MDP) with bounded-, unbounded-, and infinite-horizon properties. The main idea is to identify a "core" of an MDP, i.e., a subsystem where we provably remain with high probability, and to avoid computation on the less relevant rest of the state space. Although we identify the core using simulations and statistical techniques, it allows for rigorous error bounds in the analysis. Consequently, we obtain efficient analysis algorithms based on partial exploration for various settings, including the challenging case of strongly connected systems.

Cite as

Jan Křetínský and Tobias Meggendorfer. Of Cores: A Partial-Exploration Framework for Markov Decision Processes. In 30th International Conference on Concurrency Theory (CONCUR 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 140, pp. 5:1-5:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{kretinsky_et_al:LIPIcs.CONCUR.2019.5,
  author =	{K\v{r}et{\'\i}nsk\'{y}, Jan and Meggendorfer, Tobias},
  title =	{{Of Cores: A Partial-Exploration Framework for Markov Decision Processes}},
  booktitle =	{30th International Conference on Concurrency Theory (CONCUR 2019)},
  pages =	{5:1--5:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-121-4},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{140},
  editor =	{Fokkink, Wan and van Glabbeek, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2019.5},
  URN =		{urn:nbn:de:0030-drops-109076},
  doi =		{10.4230/LIPIcs.CONCUR.2019.5},
  annote =	{Keywords: Markov Decision Processes, Reachability, Approximation}
}

Document

DOI: 10.4230/LIPIcs.CONCUR.2018.8

Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints

Authors: Jan Kretínský, Guillermo A. Pérez, and Jean-François Raskin

Published in: LIPIcs, Volume 118, 29th International Conference on Concurrency Theory (CONCUR 2018)

Abstract

We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in a Markov decision process (MDP) with unknown probabilistic transition function and unknown reward function. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we show that in MDPs consisting of a single end component, two combinations of guarantees on the parity and mean-payoff objectives can be achieved depending on how much memory one is willing to use. (i) For all epsilon and gamma we can construct an online-learning finite-memory strategy that almost-surely satisfies the parity objective and which achieves an epsilon-optimal mean payoff with probability at least 1 - gamma. (ii) Alternatively, for all epsilon and gamma there exists an online-learning infinite-memory strategy that satisfies the parity objective surely and which achieves an epsilon-optimal mean payoff with probability at least 1 - gamma. We extend the above results to MDPs consisting of more than one end component in a natural way. Finally, we show that the aforementioned guarantees are tight, i.e. there are MDPs for which stronger combinations of the guarantees cannot be ensured.

Cite as

Jan Kretínský, Guillermo A. Pérez, and Jean-François Raskin. Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints. In 29th International Conference on Concurrency Theory (CONCUR 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 118, pp. 8:1-8:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{kretinsky_et_al:LIPIcs.CONCUR.2018.8,
  author =	{Kret{\'\i}nsk\'{y}, Jan and P\'{e}rez, Guillermo A. and Raskin, Jean-Fran\c{c}ois},
  title =	{{Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints}},
  booktitle =	{29th International Conference on Concurrency Theory (CONCUR 2018)},
  pages =	{8:1--8:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-087-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{118},
  editor =	{Schewe, Sven and Zhang, Lijun},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2018.8},
  URN =		{urn:nbn:de:0030-drops-95468},
  doi =		{10.4230/LIPIcs.CONCUR.2018.8},
  annote =	{Keywords: Markov decision processes, Reinforcement learning, Beyond worst case}
}

Document

DOI: 10.4230/LIPIcs.CONCUR.2018.32

The Satisfiability Problem for Unbounded Fragments of Probabilistic CTL

Authors: Jan Kretínský and Alexej Rotar

Published in: LIPIcs, Volume 118, 29th International Conference on Concurrency Theory (CONCUR 2018)

Abstract

We investigate the satisfiability and finite satisfiability problem for probabilistic computation-tree logic (PCTL) where operators are not restricted by any step bounds. We establish decidability for several fragments containing quantitative operators and pinpoint the difficulties arising in more complex fragments where the decidability remains open.

Cite as

Jan Kretínský and Alexej Rotar. The Satisfiability Problem for Unbounded Fragments of Probabilistic CTL. In 29th International Conference on Concurrency Theory (CONCUR 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 118, pp. 32:1-32:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{kretinsky_et_al:LIPIcs.CONCUR.2018.32,
  author =	{Kret{\'\i}nsk\'{y}, Jan and Rotar, Alexej},
  title =	{{The Satisfiability Problem for Unbounded Fragments of Probabilistic CTL}},
  booktitle =	{29th International Conference on Concurrency Theory (CONCUR 2018)},
  pages =	{32:1--32:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-087-3},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{118},
  editor =	{Schewe, Sven and Zhang, Lijun},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2018.32},
  URN =		{urn:nbn:de:0030-drops-95708},
  doi =		{10.4230/LIPIcs.CONCUR.2018.32},
  annote =	{Keywords: temporal logic, probabilistic verification, probabilistic computation tree logic, satisfiability}
}

5 Search Results for "Kret�nsk�, Jan"

Anytime Guarantees for Reachability in Uncountable Markov Decision Processes

Abstract

Cite as

Enforcing ω-Regular Properties in Markov Chains by Restarting

Abstract

Cite as

Of Cores: A Partial-Exploration Framework for Markov Decision Processes

Abstract

Cite as

Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints

Abstract

Cite as

The Satisfiability Problem for Unbounded Fragments of Probabilistic CTL

Abstract

Cite as

Thanks for your feedback!

Could not send message