Quantum Policy Gradient Algorithms

Jerbi, Sofiene; Cornelissen, Arjan; Ozols, Maris; Dunjko, Vedran

doi:10.4230/LIPIcs.TQC.2023.13

File

LIPIcs.TQC.2023.13.pdf

Filesize: 0.96 MB
24 pages

Document Identifiers

DOI: 10.4230/LIPIcs.TQC.2023.13
URN: urn:nbn:de:0030-drops-183230

Author Details

Sofiene Jerbi

Institute for Theoretical Physics, Universität Innsbruck, Austria

Arjan Cornelissen

QuSoft and University of Amsterdam, The Netherlands

Maris Ozols

QuSoft and University of Amsterdam, The Netherlands

Vedran Dunjko

applied Quantum algorithms (aQa), Leiden University, The Netherlands

Cite AsGet BibTex

Sofiene Jerbi, Arjan Cornelissen, Maris Ozols, and Vedran Dunjko. Quantum Policy Gradient Algorithms. In 18th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 266, pp. 13:1-13:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)
https://doi.org/10.4230/LIPIcs.TQC.2023.13

Abstract

Understanding the power and limitations of quantum access to data in machine learning tasks is primordial to assess the potential of quantum computing in artificial intelligence. Previous works have already shown that speed-ups in learning are possible when given quantum access to reinforcement learning environments. Yet, the applicability of quantum algorithms in this setting remains very limited, notably in environments with large state and action spaces. In this work, we design quantum algorithms to train state-of-the-art reinforcement learning policies by exploiting quantum interactions with an environment. However, these algorithms only offer full quadratic speed-ups in sample complexity over their classical analogs when the trained policies satisfy some regularity conditions. Interestingly, we find that reinforcement learning policies derived from parametrized quantum circuits are well-behaved with respect to these conditions, which showcases the benefit of a fully-quantum reinforcement learning framework.

Subject Classification

ACM Subject Classification

Theory of computation → Quantum computation theory
Theory of computation → Design and analysis of algorithms
Theory of computation → Reinforcement learning

Keywords

quantum reinforcement learning
policy gradient methods
parametrized quantum circuits

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Abdulrahman Alabdulkareem and Jean Honorio. Information-theoretic lower bounds for zero-order stochastic gradient estimation. In 2021 IEEE International Symposium on Information Theory (ISIT), pages 2316-2321. IEEE, 2021.
Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia Fiorentini. Parameterized quantum circuits as machine learning models. Quantum Science and Technology, 4(4):043001, 2019.
Marco Cerezo and Patrick J Coles. Higher order derivatives of quantum neural networks with barren plateaus. Quantum Science and Technology, 6(3):035006, 2021.
Samuel Yen-Chi Chen, Chih-Min Huang, Chia-Wei Hsing, Hsi-Sheng Goan, and Ying-Jer Kao. Variational quantum reinforcement learning via evolutionary optimization. Machine Learning: Science and Technology, 3(1):015025, 2022.
Samuel Yen-Chi Chen, Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, and Hsi-Sheng Goan. Variational quantum circuits for deep reinforcement learning. IEEE Access, 8:141007-141024, 2020.
El Amine Cherrat, Iordanis Kerenidis, and Anupam Prakash. Quantum reinforcement learning via policy iteration. arXiv:2203.01889, 2022.
Nai-Hui Chia, András Pal Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, and Chunhao Wang. Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning. Journal of the ACM, 69(5):1-72, 2022.
Arjan Cornelissen. Quantum gradient estimation of gevrey functions. arXiv:1909.13528, 2019.
Arjan Cornelissen, Yassine Hamoudi, and Sofiene Jerbi. Near-optimal quantum algorithms for multivariate mean estimation. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 33-43, 2022.
Arjan Cornelissen and Sofiene Jerbi. Quantum algorithms for multivariate monte carlo estimation. arXiv:2107.03410, 2021.
Vedran Dunjko, Yi-Kai Liu, Xingyao Wu, and Jacob M Taylor. Exponential improvements for quantum-accessible reinforcement learning. arXiv:1710.11160, 2017.
Vedran Dunjko, Jacob M Taylor, and Hans J Briegel. Quantum-enhanced machine learning. Physical review letters, 117(13):130501, 2016.
Vedran Dunjko, Jacob M Taylor, and Hans J Briegel. Advances in quantum reinforcement learning. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 282-287. IEEE, 2017.
András Gilyén, Srinivasan Arunachalam, and Nathan Wiebe. Optimizing quantum optimization algorithms via faster quantum gradient computation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1425-1444. SIAM, 2019.
Lov Grover and Terry Rudolph. Creating superpositions that correspond to efficiently integrable probability distributions. quant-ph/0208112, 2002.
Lov K Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 212-219, 1996.
Lov K Grover. A framework for fast quantum mechanical algorithms. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pages 53-62, 1998.
Arne Hamann, Vedran Dunjko, and Sabine Wölk. Quantum-accessible reinforcement learning beyond strictly epochal environments. Quantum Machine Intelligence, 3(2):1-18, 2021.
Yassine Hamoudi. Quantum sub-gaussian mean estimator. In 29th Annual European Symposium on Algorithms (ESA 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021.
William J Huggins, Kianna Wan, Jarrod McClean, Thomas E O’Brien, Nathan Wiebe, and Ryan Babbush. Nearly optimal quantum algorithm for estimating multiple expectation values. Physical Review Letters, 129(24):240501, 2022.
Sofiene Jerbi, Casper Gyurik, Simon Marshall, Hans Briegel, and Vedran Dunjko. Parametrized quantum policies for reinforcement learning. Advances in Neural Information Processing Systems, 34, 2021. URL: https://proceedings.neurips.cc/paper/2021/hash/eec96a7f788e88184c0e713456026f3f-Abstract.html.
Stephen P Jordan. Fast quantum algorithm for numerical gradient estimation. Physical review letters, 95(5):050501, 2005.
Sham Machandranath Kakade. On the sample complexity of reinforcement learning. PhD thesis, UCL (University College London), 2003.
Nate Kohl and Peter Stone. Policy gradient reinforcement learning for fast quadrupedal locomotion. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA'04. 2004, volume 3, pages 2619-2624. IEEE, 2004.
Owen Lockwood and Mei Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245-251, 2020.
G. Lugosi and S. Mendelson. Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics, 19(5):1145-1190, 2019. URL: https://doi.org/10.1007/s10208-019-09427-x.
Nico Meyer, Daniel D Scherer, Axel Plinge, Christopher Mutschler, and Michael J Hartmann. Quantum policy gradient algorithm with optimized action decoding. arXiv preprint arXiv:2212.06663, 2022.
Nico Meyer, Christian Ufrecht, Maniraman Periyasamy, Daniel D Scherer, Axel Plinge, and Christopher Mutschler. A survey on quantum reinforcement learning. arXiv preprint arXiv:2211.03464, 2022.
Piotr Mirowski, Matt Grimes, Mateusz Malinowski, Karl Moritz Hermann, Keith Anderson, Denis Teplyashin, Karen Simonyan, Andrew Zisserman, Raia Hadsell, et al. Learning to navigate in cities without a map. Advances in Neural Information Processing Systems, 31:2419-2430, 2018.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
Ashley Montanaro. Quantum speedup of monte carlo methods. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471(2181):20150301, 2015.
Pooya Ronagh. The problem of dynamic programming on a quantum computer. arXiv:1906.02229, 2019.
Valeria Saggio, Beate E Asenbeck, Arne Hamann, Teodor Strömberg, Peter Schiansky, Vedran Dunjko, Nicolai Friis, Nicholas C Harris, Michael Hochberg, Dirk Englund, et al. Experimental quantum speed-up in reinforcement learning agents. Nature, 591(7849):229-233, 2021.
Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran. Evaluating analytic gradients on quantum hardware. Physical Review A, 99(3):032331, 2019.
André Sequeira, Luis Paulo Santos, and Luis Soares Barbosa. Policy gradients using variational quantum circuits. Quantum Machine Intelligence, 5(1):18, 2023.
David Silver. Lectures on reinforcement learning. url: https://www.davidsilver.uk/teaching/, 2015.
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
Andrea Skolik, Sofiene Jerbi, and Vedran Dunjko. Quantum agents in the gym: a variational quantum algorithm for deep q-learning. Quantum, 6:720, 2022. URL: https://doi.org/10.22331/q-2022-05-24-720.
Richard S Sutton, Andrew G Barto, et al. Reinforcement learning: An introduction. MIT Press, 1998.
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057-1063, 2000.
Daochen Wang, Aarthi Sundaram, Robin Kothari, Ashish Kapoor, and Martin Roetteler. Quantum algorithms for reinforcement learning with a generative model. In International Conference on Machine Learning, pages 10916-10926. PMLR, 2021.
Daochen Wang, Xuchen You, Tongyang Li, and Andrew M Childs. Quantum exploration algorithms for multi-armed bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10102-10110, 2021.
Simon Wiedemann, Daniel Hein, Steffen Udluft, and Christian Mendl. Quantum policy iteration via amplitude estimation and grover search-towards quantum advantage for reinforcement learning. arXiv preprint arXiv:2206.04741, 2022.
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
Shaojun Wu, Shan Jin, Dingding Wen, and Xiaoting Wang. Quantum reinforcement learning in continuous action space. arXiv:2012.10711, 2020.