Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints

Lafleur, Daphné; Chandar, Sarath; Pesant, Gilles

doi:10.4230/LIPIcs.CP.2022.30

File

LIPIcs.CP.2022.30.pdf

Filesize: 2.48 MB
16 pages

Document Identifiers

DOI: 10.4230/LIPIcs.CP.2022.30
URN: urn:nbn:de:0030-drops-166594

Author Details

Daphné Lafleur

Polytechnique Montréal, Canada
Quebec Artificial Intelligence Institute (Mila), Canada

Sarath Chandar

Polytechnique Montréal, Canada
Quebec Artificial Intelligence Institute (Mila), Canada
Canada CIFAR AI Chair, Toronto, Canada

Gilles Pesant

Polytechnique Montréal, Canada

Acknowledgements

We would like to thank all the members of Chandar Lab, Laboratoire Quosséça, family and friends who gave very useful feedback and improvements before the submission. We would also like to thank the CP 2022 reviewers for their constructive criticism.

Cite AsGet BibTex

Daphné Lafleur, Sarath Chandar, and Gilles Pesant. Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints. In 28th International Conference on Principles and Practice of Constraint Programming (CP 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 235, pp. 30:1-30:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)
https://doi.org/10.4230/LIPIcs.CP.2022.30

Abstract

While Machine Learning (ML) techniques are good at generating data similar to a dataset, they lack the capacity to enforce constraints. On the other hand, any solution to a Constraint Programming (CP) model satisfies its constraints but has no obligation to imitate a dataset. Yet, we sometimes need both. In this paper we borrow RL-Tuner, a Reinforcement Learning (RL) algorithm introduced to tune neural networks, as our enabling architecture to exploit the respective strengths of ML and CP. RL-Tuner maximizes the sum of a pretrained network’s learned probabilities and of manually-tuned penalties for each violated constraint. We replace the latter with outputs of a CP model representing the marginal probabilities of each value and the number of constraint violations. As was the case for the original RL-Tuner, we apply our algorithm to music generation since it is a highly-constrained domain for which CP is especially suited. We show that combining ML and CP, as opposed to using them individually, allows the agent to reflect the pretrained network while taking into account constraints, leading to melodic lines that respect both the corpus' style and the music theory constraints.

Subject Classification

ACM Subject Classification

Software and its engineering → Constraint and logic languages
Theory of computation → Reinforcement learning
Applied computing → Sound and music computing

Keywords

Constraint programming
reinforcement learning
RNN
music generation

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Jean-Pierre Briot, Gaëtan Hadjeres, and François Pachet. Deep learning techniques for music generation - A survey. CoRR, abs/1709.01620, 2017. URL: http://arxiv.org/abs/1709.01620.
Sophie Demassey, Gilles Pesant, and Louis-Martin Rousseau. A Cost-Regular based Hybrid Column Generation Approach. Constraints, 11(4):315-333, December 2006. URL: https://doi.org/10.1007/s10601-006-9003-7.
Gaëtan Hadjeres and Frank Nielsen. Anticipation-rnn: enforcing unary constraints in sequence generation, with application to interactive music generation. Neural Comput. Appl., 32(4):995-1005, 2020. URL: https://doi.org/10.1007/s00521-018-3868-4.
Gaëtan Hadjeres, François Pachet, and Frank Nielsen. Deepbach: a steerable model for bach chorales generation. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 1362-1371. PMLR, 2017. URL: http://proceedings.mlr.press/v70/hadjeres17a.html.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9:1735-80, December 1997. URL: https://doi.org/10.1162/neco.1997.9.8.1735.
Cheng-Zhi Anna Huang, Tim Cooijmans, Adam Roberts, Aaron C. Courville, and Douglas Eck. Counterpoint by convolution. CoRR, abs/1903.07227, 2019. URL: http://arxiv.org/abs/1903.07227.
Natasha Jaques, Shixiang Gu, Richard E. Turner, and Douglas Eck. Tuning recurrent neural networks with reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net, 2017. URL: https://openreview.net/forum?id=Syyv2e-Kx.
Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey. CoRR, cs.AI/9605103, 1996. URL: http://arxiv.org/abs/9605103.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charlie Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518:529-533, 2015.
Olof Mogren. C-RNN-GAN: continuous recurrent neural networks with adversarial training. CoRR, abs/1611.09904, 2016. URL: http://arxiv.org/abs/1611.09904.
Gilles Pesant. A regular language membership constraint for finite sequences of variables. In Mark Wallace, editor, Principles and Practice of Constraint Programming - CP 2004, pages 482-495, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.
Gilles Pesant. From support propagation to belief propagation in constraint programming. J. Artif. Intell. Res., 66:123-150, 2019. URL: https://doi.org/10.1613/jair.1.11487.
Jose David Fernández Rodriguez and Francisco J. Vico. AI methods in algorithmic composition: A comprehensive survey. CoRR, abs/1402.0585, 2014. URL: http://arxiv.org/abs/1402.0585.
Robin M. Schmidt. Recurrent neural networks (rnns): A gentle introduction and overview. CoRR, abs/1912.05911, 2019. URL: http://arxiv.org/abs/1912.05911.
Peter Schubert. Modal Counterpoint, Renaissance Style. Oxford University Press, 2nd edition, 2008.
Charlotte Truchet and Gérard Assayag, editors. Constraint Programming in Music. Wiley, 2011.
Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. CoRR, abs/1509.06461, 2015. URL: http://arxiv.org/abs/1509.06461.
Willem Jan van Hoeve, Gilles Pesant, and Louis-Martin Rousseau. On global warming: Flow-based soft global constraints. J. Heuristics, 12(4-5):347-373, 2006. URL: https://doi.org/10.1007/s10732-006-6550-4.
Halley Young, Maxwell Du, and Osbert Bastani. Neurosymbolic deep generative models for sequence data with relational constraints, 2021. URL: https://openreview.net/forum?id=Y5TgO3J_Glc.

Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints

Authors Daphné Lafleur , Sarath Chandar , Gilles Pesant

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints

Authors Daphné Lafleur , Sarath Chandar , Gilles Pesant

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message