Quantum Policy Gradient Algorithms

Authors Sofiene Jerbi , Arjan Cornelissen , Maris Ozols , Vedran Dunjko

Thumbnail PDF


  • Filesize: 0.96 MB
  • 24 pages

Document Identifiers

Author Details

Sofiene Jerbi
  • Institute for Theoretical Physics, Universität Innsbruck, Austria
Arjan Cornelissen
  • QuSoft and University of Amsterdam, The Netherlands
Maris Ozols
  • QuSoft and University of Amsterdam, The Netherlands
Vedran Dunjko
  • applied Quantum algorithms (aQa), Leiden University, The Netherlands

Cite AsGet BibTex

Sofiene Jerbi, Arjan Cornelissen, Maris Ozols, and Vedran Dunjko. Quantum Policy Gradient Algorithms. In 18th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 266, pp. 13:1-13:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Understanding the power and limitations of quantum access to data in machine learning tasks is primordial to assess the potential of quantum computing in artificial intelligence. Previous works have already shown that speed-ups in learning are possible when given quantum access to reinforcement learning environments. Yet, the applicability of quantum algorithms in this setting remains very limited, notably in environments with large state and action spaces. In this work, we design quantum algorithms to train state-of-the-art reinforcement learning policies by exploiting quantum interactions with an environment. However, these algorithms only offer full quadratic speed-ups in sample complexity over their classical analogs when the trained policies satisfy some regularity conditions. Interestingly, we find that reinforcement learning policies derived from parametrized quantum circuits are well-behaved with respect to these conditions, which showcases the benefit of a fully-quantum reinforcement learning framework.

Subject Classification

ACM Subject Classification
  • Theory of computation → Quantum computation theory
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Reinforcement learning
  • quantum reinforcement learning
  • policy gradient methods
  • parametrized quantum circuits


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Abdulrahman Alabdulkareem and Jean Honorio. Information-theoretic lower bounds for zero-order stochastic gradient estimation. In 2021 IEEE International Symposium on Information Theory (ISIT), pages 2316-2321. IEEE, 2021. Google Scholar
  2. Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia Fiorentini. Parameterized quantum circuits as machine learning models. Quantum Science and Technology, 4(4):043001, 2019. Google Scholar
  3. Marco Cerezo and Patrick J Coles. Higher order derivatives of quantum neural networks with barren plateaus. Quantum Science and Technology, 6(3):035006, 2021. Google Scholar
  4. Samuel Yen-Chi Chen, Chih-Min Huang, Chia-Wei Hsing, Hsi-Sheng Goan, and Ying-Jer Kao. Variational quantum reinforcement learning via evolutionary optimization. Machine Learning: Science and Technology, 3(1):015025, 2022. Google Scholar
  5. Samuel Yen-Chi Chen, Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, and Hsi-Sheng Goan. Variational quantum circuits for deep reinforcement learning. IEEE Access, 8:141007-141024, 2020. Google Scholar
  6. El Amine Cherrat, Iordanis Kerenidis, and Anupam Prakash. Quantum reinforcement learning via policy iteration. arXiv:2203.01889, 2022. Google Scholar
  7. Nai-Hui Chia, András Pal Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, and Chunhao Wang. Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning. Journal of the ACM, 69(5):1-72, 2022. Google Scholar
  8. Arjan Cornelissen. Quantum gradient estimation of gevrey functions. arXiv:1909.13528, 2019. Google Scholar
  9. Arjan Cornelissen, Yassine Hamoudi, and Sofiene Jerbi. Near-optimal quantum algorithms for multivariate mean estimation. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 33-43, 2022. Google Scholar
  10. Arjan Cornelissen and Sofiene Jerbi. Quantum algorithms for multivariate monte carlo estimation. arXiv:2107.03410, 2021. Google Scholar
  11. Vedran Dunjko, Yi-Kai Liu, Xingyao Wu, and Jacob M Taylor. Exponential improvements for quantum-accessible reinforcement learning. arXiv:1710.11160, 2017. Google Scholar
  12. Vedran Dunjko, Jacob M Taylor, and Hans J Briegel. Quantum-enhanced machine learning. Physical review letters, 117(13):130501, 2016. Google Scholar
  13. Vedran Dunjko, Jacob M Taylor, and Hans J Briegel. Advances in quantum reinforcement learning. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 282-287. IEEE, 2017. Google Scholar
  14. András Gilyén, Srinivasan Arunachalam, and Nathan Wiebe. Optimizing quantum optimization algorithms via faster quantum gradient computation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1425-1444. SIAM, 2019. Google Scholar
  15. Lov Grover and Terry Rudolph. Creating superpositions that correspond to efficiently integrable probability distributions. quant-ph/0208112, 2002. Google Scholar
  16. Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 212-219, 1996. Google Scholar
  17. Lov K Grover. A framework for fast quantum mechanical algorithms. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 53-62, 1998. Google Scholar
  18. Arne Hamann, Vedran Dunjko, and Sabine Wölk. Quantum-accessible reinforcement learning beyond strictly epochal environments. Quantum Machine Intelligence, 3(2):1-18, 2021. Google Scholar
  19. Yassine Hamoudi. Quantum sub-gaussian mean estimator. In 29th Annual European Symposium on Algorithms (ESA 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021. Google Scholar
  20. William J Huggins, Kianna Wan, Jarrod McClean, Thomas E O’Brien, Nathan Wiebe, and Ryan Babbush. Nearly optimal quantum algorithm for estimating multiple expectation values. Physical Review Letters, 129(24):240501, 2022. Google Scholar
  21. Sofiene Jerbi, Casper Gyurik, Simon Marshall, Hans Briegel, and Vedran Dunjko. Parametrized quantum policies for reinforcement learning. Advances in Neural Information Processing Systems, 34, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/eec96a7f788e88184c0e713456026f3f-Abstract.html.
  22. Stephen P Jordan. Fast quantum algorithm for numerical gradient estimation. Physical review letters, 95(5):050501, 2005. Google Scholar
  23. Sham Machandranath Kakade. On the sample complexity of reinforcement learning. PhD thesis, UCL (University College London), 2003. Google Scholar
  24. Nate Kohl and Peter Stone. Policy gradient reinforcement learning for fast quadrupedal locomotion. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA'04. 2004, volume 3, pages 2619-2624. IEEE, 2004. Google Scholar
  25. Owen Lockwood and Mei Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245-251, 2020. Google Scholar
  26. G. Lugosi and S. Mendelson. Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics, 19(5):1145-1190, 2019. URL: https://doi.org/10.1007/s10208-019-09427-x.
  27. Nico Meyer, Daniel D Scherer, Axel Plinge, Christopher Mutschler, and Michael J Hartmann. Quantum policy gradient algorithm with optimized action decoding. arXiv preprint arXiv:2212.06663, 2022. Google Scholar
  28. Nico Meyer, Christian Ufrecht, Maniraman Periyasamy, Daniel D Scherer, Axel Plinge, and Christopher Mutschler. A survey on quantum reinforcement learning. arXiv preprint arXiv:2211.03464, 2022. Google Scholar
  29. Piotr Mirowski, Matt Grimes, Mateusz Malinowski, Karl Moritz Hermann, Keith Anderson, Denis Teplyashin, Karen Simonyan, Andrew Zisserman, Raia Hadsell, et al. Learning to navigate in cities without a map. Advances in Neural Information Processing Systems, 31:2419-2430, 2018. Google Scholar
  30. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015. Google Scholar
  31. Ashley Montanaro. Quantum speedup of monte carlo methods. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471(2181):20150301, 2015. Google Scholar
  32. Pooya Ronagh. The problem of dynamic programming on a quantum computer. arXiv:1906.02229, 2019. Google Scholar
  33. Valeria Saggio, Beate E Asenbeck, Arne Hamann, Teodor Strömberg, Peter Schiansky, Vedran Dunjko, Nicolai Friis, Nicholas C Harris, Michael Hochberg, Dirk Englund, et al. Experimental quantum speed-up in reinforcement learning agents. Nature, 591(7849):229-233, 2021. Google Scholar
  34. Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran. Evaluating analytic gradients on quantum hardware. Physical Review A, 99(3):032331, 2019. Google Scholar
  35. André Sequeira, Luis Paulo Santos, and Luis Soares Barbosa. Policy gradients using variational quantum circuits. Quantum Machine Intelligence, 5(1):18, 2023. Google Scholar
  36. David Silver. Lectures on reinforcement learning. url: https://www.davidsilver.uk/teaching/, 2015.
  37. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017. Google Scholar
  38. Andrea Skolik, Sofiene Jerbi, and Vedran Dunjko. Quantum agents in the gym: a variational quantum algorithm for deep q-learning. Quantum, 6:720, 2022. URL: https://doi.org/10.22331/q-2022-05-24-720.
  39. Richard S Sutton, Andrew G Barto, et al. Reinforcement learning: An introduction. MIT Press, 1998. Google Scholar
  40. Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057-1063, 2000. Google Scholar
  41. Daochen Wang, Aarthi Sundaram, Robin Kothari, Ashish Kapoor, and Martin Roetteler. Quantum algorithms for reinforcement learning with a generative model. In International Conference on Machine Learning, pages 10916-10926. PMLR, 2021. Google Scholar
  42. Daochen Wang, Xuchen You, Tongyang Li, and Andrew M Childs. Quantum exploration algorithms for multi-armed bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10102-10110, 2021. Google Scholar
  43. Simon Wiedemann, Daniel Hein, Steffen Udluft, and Christian Mendl. Quantum policy iteration via amplitude estimation and grover search-towards quantum advantage for reinforcement learning. arXiv preprint arXiv:2206.04741, 2022. Google Scholar
  44. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992. Google Scholar
  45. Shaojun Wu, Shan Jin, Dingding Wen, and Xiaoting Wang. Quantum reinforcement learning in continuous action space. arXiv:2012.10711, 2020. Google Scholar