The Robot Routing Problem for Collecting Aggregate Stochastic Rewards

Dimitrova, Rayna; Gavran, Ivan; Majumdar, Rupak; Prabhu, Vinayak S.; Soudjani, Sadegh Esmaeil Zadeh

doi:10.4230/LIPIcs.CONCUR.2017.13

Abstract

We propose a new model for formalizing reward collection problems on graphs with dynamically generated rewards which may appear and disappear based on a stochastic model. The robot routing problem is modeled as a graph whose nodes are stochastic processes generating  potential rewards over discrete time. The rewards are generated according to the stochastic process, but at each step, an existing reward disappears with a given probability. The edges in the graph encode the (unit-distance) paths between the rewards' locations. On visiting a node, the robot collects the accumulated reward at the node at that time, but traveling between the nodes takes time. The optimization question asks to compute an optimal (or epsilon-optimal) path  that maximizes the expected collected rewards.

We consider the finite and infinite-horizon robot routing problems. For finite-horizon, the goal is to maximize the total expected reward, while for infinite horizon we consider limit-average objectives. We study the computational and strategy complexity of these problems, establish NP-lower bounds and show that optimal strategies require memory in general. We also provide an algorithm for computing epsilon-optimal infinite paths for arbitrary epsilon > 0.

Sofia Amador, Steven Okamoto, and Roie Zivan. Dynamic multi-agent task allocation with spatial and temporal constraints. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pages 1384-1390. AAAI Press, 2014. URL: http://dl.acm.org/citation.cfm?id=2615731.2616029.
Nikhil Bansal, Avrim Blum, Shuchi Chawla, and Adam Meyerson. Approximation algorithms for deadline-TSP and vehicle routing with time-windows. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing 2004, pages 166-174. ACM, 2004. URL: http://dx.doi.org/10.1145/1007352.1007385.
Dimitris J. Bertsimas and Garrett J. van Ryzin. A stochastic and dynamic vehicle routing problem in the Euclidean plane. Operations Research, 39(4):601-615, 1991. URL: http://dx.doi.org/10.1287/opre.39.4.601.
Avrim Blum, Shuchi Chawla, David R. Karger, Terran Lane, Adam Meyerson, and Maria Minkoff. Approximation algorithms for orienteering and discounted-reward TSP. SIAM J. Comput., 37(2):653-670, 2007. URL: http://dx.doi.org/10.1137/050645464.
Patricia Bouyer, Piotr Hofman, Nicolas Markey, Mickael Randour, and Martin Zimmermann. Bounding average-energy games. In Foundations of Software Science and Computation Structures: 20th International Conference, FOSSACS 2017, pages 179-195. Springer Berlin Heidelberg, 2017. URL: http://dx.doi.org/10.1007/978-3-662-54458-7_11.
Tomás Brázdil, Petr Hlinený, Antonín Kucera, Vojtech Rehák, and Matús Abaffy. Strategy synthesis in adversarial patrolling games. CoRR, abs/1507.03407, 2015.
Francesco Bullo, Emilio Frazzoli, Marco Pavone, Ketan Savla, and Stephen L. Smith. Dynamic vehicle routing for robotic systems. Proceedings of the IEEE, 99(9):1482-1504, 2011. URL: http://dx.doi.org/10.1109/JPROC.2011.2158181.
Krishnendu Chatterjee and Vinayak S. Prabhu. Quantitative temporal simulation and refinement distances for timed systems. IEEE Trans. Automat. Contr., 60(9):2291-2306, 2015. URL: http://dx.doi.org/10.1109/TAC.2015.2404612.
Rayna Dimitrova, Ivan Gavran, Rupak Majumdar, Vinayak S. Prabhu, and Sadegh Esmaeil Zadeh Soudjani. The robot routing problem for collecting aggregate stochastic rewards. CoRR, abs/1704.05303, 2017. URL: http://arxiv.org/abs/1704.05303.
Ali Ekici, Pinar Keskinocak, and Sven Koenig. Multi-robot routing with linear decreasing rewards over time. In 2009 IEEE International Conference on Robotics and Automation, ICRA 2009, pages 958-963. IEEE, 2009. URL: http://dx.doi.org/10.1109/ROBOT.2009.5152803.
Ali Ekici and Anand Retharekar. Multiple agents maximum collection problem with time dependent rewards. Computers & Industrial Engineering, 64(4):1009-1018, 2013. URL: http://dx.doi.org/10.1016/j.cie.2013.01.010.
Sadegh Esmaeil Zadeh Soudjani and Alessandro Abate. Adaptive and sequential gridding procedures for the abstraction and verification of stochastic processes. SIAM Journal on Applied Dynamical Systems, 12(2):921-956, 2013. URL: http://dx.doi.org/10.1137/120871456.
Sadegh Esmaeil Zadeh Soudjani and Rupak Majumdar. Controller synthesis for reward collecting Markov processes in continuous space. In Proceedings of the 20th International Conference on Hybrid Systems: Computation and Control, HSCC '17, pages 45-54, New York, NY, USA, 2017. ACM. URL: http://dx.doi.org/10.1145/3049797.3049827.
Yuri Gurevich and Leo Harrington. Trees, automata, and games. In Proc. Symposium on Theory of Computing, pages 60-65. ACM Press, 1982. URL: http://dx.doi.org/10.1145/800070.802177.
Satoshi Hoshino and Shingo Ugajin. Adaptive patrolling by mobile robot for changing visitor trends. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 104-110, 2016. URL: http://dx.doi.org/10.1109/IROS.2016.7759041.
Richard M. Karp. A characterization of the minimum cycle mean in a digraph. Discrete Mathematics, 23(3):309-311, September 1978. URL: http://dx.doi.org/10.1016/0012-365X(78)90011-0.
Anthonius W. J. Kolen, Alexander H. G. Rinnooy Kan, and Henricus W. J. M. Trienekens. Vehicle routing with time windows. Oper. Res., 35(2):266-273, April 1987. URL: http://dx.doi.org/10.1287/opre.35.2.266.
Justin Melvin, Pinar Keskinocak, Sven Koenig, Craig A. Tovey, and Banu Yuksel Ozkaya. Multi-robot routing with rewards and disjoint time windows. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2332-2337. IEEE, 2007. URL: http://dx.doi.org/10.1109/IROS.2007.4399625.
Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley &Sons, Inc., New York, NY, USA, 1st edition, 1994. URL: http://dx.doi.org/10.1002/9780470316887.
Ruben Stranders, Enrique Munoz de Cote, Alex Rogers, and Nicholas R. Jennings. Near-optimal continuous patrolling with teams of mobile information gathering agents. Artificial Intelligence, 195:63 - 105, 2013. URL: http://dx.doi.org/10.1016/j.artint.2012.10.006.
Robert Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2):146-160, 1972. URL: http://dx.doi.org/10.1137/0201010.
Wolfgang Thomas. On the synthesis of strategies in infinite games. In STACS 95: Theoretical Aspects of Computer Science, volume 900 of Lecture Notes in Computer Science, pages 1-13. Springer-Verlag, 1995. URL: http://dx.doi.org/10.1007/3-540-59042-0_57.
Pieter Vansteenwegen, Wouter Souffriau, and Dirk Van Oudheusden. The orienteering problem: A survey. European Journal of Operational Research, 209(1):1-10, 2011. URL: http://dx.doi.org/10.1016/j.ejor.2010.03.045.
Changyun Wei, Koen V. Hindriks, and Catholijn M. Jonker. Dynamic task allocation for multi-robot search and retrieval tasks. Appl. Intell., 45(2):383-401, 2016. URL: http://dx.doi.org/10.1007/s10489-016-0771-5.
Tichakorn Wongpiromsarn, Alphan Ulusoy, Calin Belta, Emilio Frazzoli, and Daniela Rus. Incremental synthesis of control policies for heterogeneous multi-agent systems with linear temporal logic specifications. In 2013 IEEE International Conference on Robotics and Automation, pages 5011-5018, 2013. URL: http://dx.doi.org/10.1109/ICRA.2013.6631293.
Uri Zwick and Mike Paterson. The complexity of mean payoff games on graphs. Theor. Comput. Sci., 158(1&2):343-359, 1996. URL: http://dx.doi.org/10.1016/0304-3975(95)00188-3.

The Robot Routing Problem for Collecting Aggregate Stochastic Rewards

Authors Rayna Dimitrova, Ivan Gavran, Rupak Majumdar, Vinayak S. Prabhu, Sadegh Esmaeil Zadeh Soudjani

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message