Abstract
The classical model of Markov decision processes with costs or rewards, while widely used to formalize optimal decision making, cannot capture scenarios where there are multiple objectives for the agent during the system evolution, but only one of these objectives gets actualized upon termination. We introduce the model of Markov decision processes with alternative objectives (MDPAO) for formalizing optimization in such scenarios. To compute the strategy to optimize the expected cost/reward upon termination, we need to figure out how to balance the values of the alternative objectives. This requires analysis of the underlying infinitestate process that tracks the accumulated values of all the objectives. While the decidability of the problem of computing the exact optimal strategy for the general model remains open, we present the following results. First, for a Markov chain with alternative objectives, the optimal expected cost/reward can be computed in polynomialtime. Second, for a singlestate process with two actions and multiple objectives we show how to compute the optimal decision strategy. Third, for a process with only two alternative objectives, we present a reduction to the minimum expected accumulated reward problem for onecounter MDPs, and this leads to decidability for this case under some technical restrictions. Finally, we show that optimal cost/reward can be approximated up to a constant additive factor for the general problem.
BibTeX  Entry
@InProceedings{alur_et_al:LIPIcs:2016:6569,
author = {Rajeev Alur and Marco Faella and Sampath Kannan and Nimit Singhania},
title = {{Hedging Bets in Markov Decision Processes}},
booktitle = {25th EACSL Annual Conference on Computer Science Logic (CSL 2016)},
pages = {29:129:20},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770224},
ISSN = {18688969},
year = {2016},
volume = {62},
editor = {JeanMarc Talbot and Laurent Regnier},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2016/6569},
URN = {urn:nbn:de:0030drops65698},
doi = {10.4230/LIPIcs.CSL.2016.29},
annote = {Keywords: Markov decision processes, Infinite state systems, Multiobjective optimization}
}
Keywords: 

Markov decision processes, Infinite state systems, Multiobjective optimization 
Collection: 

25th EACSL Annual Conference on Computer Science Logic (CSL 2016) 
Issue Date: 

2016 
Date of publication: 

29.08.2016 