Routing Using Safe Reinforcement Learning

Authors Gautham Nayak Seetanadi , Karl-Erik Årzén

Thumbnail PDF


  • Filesize: 441 kB
  • 8 pages

Document Identifiers

Author Details

Gautham Nayak Seetanadi
  • Department of Automatic Control, Lund University, Sweden
Karl-Erik Årzén
  • Department of Automatic Control, Lund University, Sweden

Cite AsGet BibTex

Gautham Nayak Seetanadi and Karl-Erik Årzén. Routing Using Safe Reinforcement Learning. In 2nd Workshop on Fog Computing and the IoT (Fog-IoT 2020). Open Access Series in Informatics (OASIcs), Volume 80, pp. 6:1-6:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


The ever increasing number of connected devices has lead to a metoric rise in the amount data to be processed. This has caused computation to be moved to the edge of the cloud increasing the importance of efficiency in the whole of cloud. The use of this fog computing for time-critical control applications is on the rise and requires robust guarantees on transmission times of the packets in the network while reducing total transmission times of the various packets. We consider networks in which the transmission times that may vary due to mobility of devices, congestion and similar artifacts. We assume knowledge of the worst case tranmssion times over each link and evaluate the typical tranmssion times through exploration. We present the use of reinforcement learning to find optimal paths through the network while never violating preset deadlines. We show that with appropriate domain knowledge, using popular reinforcement learning techniques is a promising prospect even in time-critical applications.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Reinforcement learning
  • Networks → Packet scheduling
  • Real time routing
  • safe exploration
  • safe reinforcement learning
  • time-critical systems
  • dynamic routing


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Jeff Ahrenholz. Comparison of core network emulation platforms. In 2010-Milcom 2010 Military Communications Conference, pages 166-171. IEEE, 2010. Google Scholar
  2. Sanjoy Baruah. Rapid routing with guaranteed delay bounds. In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 13-22, December 2018. Google Scholar
  3. Rogério Le~ao Santos De Oliveira, Christiane Marie Schweitzer, Ailton Akira Shinoda, and Ligia Rodrigues Prete. Using mininet for emulation and prototyping software-defined networks. In 2014 IEEE Colombian Conference on Communications and Computing (COLCOM), pages 1-6. Ieee, 2014. Google Scholar
  4. E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1(1):269-271, December 1959. URL:
  5. Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, et al. An investigation of model-free planning. arXiv preprint, 2019. URL:
  6. Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using networkx. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors, Proceedings of the 7th Python in Science Conference, pages 11-15, Pasadena, CA USA, 2008. Google Scholar
  7. Kurt Mehlhorn and Peter Sanders. Algorithms and Data Structures: The Basic Toolbox. Springer Publishing Company, Incorporated, 1 edition, 2008. Google Scholar
  8. Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An Introduction. Adaptive computation and machine learning. MIT Press, 2018. Google Scholar
  9. Gerald Tesauro. Temporal difference learning and td-gammon. Commun. ACM, 38(3):58-68, March 1995. URL:
  10. Matteo Turchetta, Felix Berkenkamp, and Andreas Krause. Safe exploration for interactive machine learning. In Proc. Neural Information Processing Systems (NeurIPS), December 2019. Google Scholar
  11. Kim P Wabersich and Melanie N Zeilinger. Safe exploration of nonlinear dynamical systems: A predictive safety filter for reinforcement learning. arXiv preprint, 2018. URL:
  12. Marco Wiering and Martijn Van Otterlo. Reinforcement learning. Adaptation, learning, and optimization, 12:3, 2012. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail