Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints

Authors Daphné Lafleur , Sarath Chandar , Gilles Pesant



PDF
Thumbnail PDF

File

LIPIcs.CP.2022.30.pdf
  • Filesize: 2.48 MB
  • 16 pages

Document Identifiers

Author Details

Daphné Lafleur
  • Polytechnique Montréal, Canada
  • Quebec Artificial Intelligence Institute (Mila), Canada
Sarath Chandar
  • Polytechnique Montréal, Canada
  • Quebec Artificial Intelligence Institute (Mila), Canada
  • Canada CIFAR AI Chair, Toronto, Canada
Gilles Pesant
  • Polytechnique Montréal, Canada

Acknowledgements

We would like to thank all the members of Chandar Lab, Laboratoire Quosséça, family and friends who gave very useful feedback and improvements before the submission. We would also like to thank the CP 2022 reviewers for their constructive criticism.

Cite AsGet BibTex

Daphné Lafleur, Sarath Chandar, and Gilles Pesant. Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints. In 28th International Conference on Principles and Practice of Constraint Programming (CP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 235, pp. 30:1-30:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.CP.2022.30

Abstract

While Machine Learning (ML) techniques are good at generating data similar to a dataset, they lack the capacity to enforce constraints. On the other hand, any solution to a Constraint Programming (CP) model satisfies its constraints but has no obligation to imitate a dataset. Yet, we sometimes need both. In this paper we borrow RL-Tuner, a Reinforcement Learning (RL) algorithm introduced to tune neural networks, as our enabling architecture to exploit the respective strengths of ML and CP. RL-Tuner maximizes the sum of a pretrained network’s learned probabilities and of manually-tuned penalties for each violated constraint. We replace the latter with outputs of a CP model representing the marginal probabilities of each value and the number of constraint violations. As was the case for the original RL-Tuner, we apply our algorithm to music generation since it is a highly-constrained domain for which CP is especially suited. We show that combining ML and CP, as opposed to using them individually, allows the agent to reflect the pretrained network while taking into account constraints, leading to melodic lines that respect both the corpus' style and the music theory constraints.

Subject Classification

ACM Subject Classification
  • Software and its engineering → Constraint and logic languages
  • Theory of computation → Reinforcement learning
  • Applied computing → Sound and music computing
Keywords
  • Constraint programming
  • reinforcement learning
  • RNN
  • music generation

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Jean-Pierre Briot, Gaëtan Hadjeres, and François Pachet. Deep learning techniques for music generation - A survey. CoRR, abs/1709.01620, 2017. URL: http://arxiv.org/abs/1709.01620.
  2. Sophie Demassey, Gilles Pesant, and Louis-Martin Rousseau. A Cost-Regular based Hybrid Column Generation Approach. Constraints, 11(4):315-333, December 2006. URL: https://doi.org/10.1007/s10601-006-9003-7.
  3. Gaëtan Hadjeres and Frank Nielsen. Anticipation-rnn: enforcing unary constraints in sequence generation, with application to interactive music generation. Neural Comput. Appl., 32(4):995-1005, 2020. URL: https://doi.org/10.1007/s00521-018-3868-4.
  4. Gaëtan Hadjeres, François Pachet, and Frank Nielsen. Deepbach: a steerable model for bach chorales generation. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 1362-1371. PMLR, 2017. URL: http://proceedings.mlr.press/v70/hadjeres17a.html.
  5. Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9:1735-80, December 1997. URL: https://doi.org/10.1162/neco.1997.9.8.1735.
  6. Cheng-Zhi Anna Huang, Tim Cooijmans, Adam Roberts, Aaron C. Courville, and Douglas Eck. Counterpoint by convolution. CoRR, abs/1903.07227, 2019. URL: http://arxiv.org/abs/1903.07227.
  7. Natasha Jaques, Shixiang Gu, Richard E. Turner, and Douglas Eck. Tuning recurrent neural networks with reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net, 2017. URL: https://openreview.net/forum?id=Syyv2e-Kx.
  8. Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey. CoRR, cs.AI/9605103, 1996. URL: http://arxiv.org/abs/9605103.
  9. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charlie Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518:529-533, 2015. Google Scholar
  10. Olof Mogren. C-RNN-GAN: continuous recurrent neural networks with adversarial training. CoRR, abs/1611.09904, 2016. URL: http://arxiv.org/abs/1611.09904.
  11. Gilles Pesant. A regular language membership constraint for finite sequences of variables. In Mark Wallace, editor, Principles and Practice of Constraint Programming - CP 2004, pages 482-495, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg. Google Scholar
  12. Gilles Pesant. From support propagation to belief propagation in constraint programming. J. Artif. Intell. Res., 66:123-150, 2019. URL: https://doi.org/10.1613/jair.1.11487.
  13. Jose David Fernández Rodriguez and Francisco J. Vico. AI methods in algorithmic composition: A comprehensive survey. CoRR, abs/1402.0585, 2014. URL: http://arxiv.org/abs/1402.0585.
  14. Robin M. Schmidt. Recurrent neural networks (rnns): A gentle introduction and overview. CoRR, abs/1912.05911, 2019. URL: http://arxiv.org/abs/1912.05911.
  15. Peter Schubert. Modal Counterpoint, Renaissance Style. Oxford University Press, 2nd edition, 2008. Google Scholar
  16. Charlotte Truchet and Gérard Assayag, editors. Constraint Programming in Music. Wiley, 2011. Google Scholar
  17. Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. CoRR, abs/1509.06461, 2015. URL: http://arxiv.org/abs/1509.06461.
  18. Willem Jan van Hoeve, Gilles Pesant, and Louis-Martin Rousseau. On global warming: Flow-based soft global constraints. J. Heuristics, 12(4-5):347-373, 2006. URL: https://doi.org/10.1007/s10732-006-6550-4.
  19. Halley Young, Maxwell Du, and Osbert Bastani. Neurosymbolic deep generative models for sequence data with relational constraints, 2021. URL: https://openreview.net/forum?id=Y5TgO3J_Glc.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail