Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver

Marty, Tom; François, Tristan; Tessier, Pierre; Gautier, Louis; Rousseau, Louis-Martin; Cappart, Quentin

doi:10.4230/LIPIcs.CP.2023.25

Abstract

Constraint programming is known for being an efficient approach to solving combinatorial problems. Important design choices in a solver are the branching heuristics, designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. Although several generic variable-selection heuristics are available in the literature, the options for value-selection heuristics are more scarce. We propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network. Experiments on graph coloring, maximum independent set, and maximum cut problems show that this framework competes with the well-known impact-based and activity-based search heuristics and can find solutions close to optimality without requiring a large number of backtracks.

Joshua Achiam. Spinning up as a deep RL researcher, October 2018. URL: https://spinningup.openai.com/en/latest/spinningup/spinningup.html.
Réka Albert and Albert-László Barabási. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002.
Gilles Audemard, Christophe Lecoutre, and Emmanuel Lonca. Proceedings of the 2022 XCSP3 competition. arXiv preprint arXiv:2209.00917, 2022.
Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.
Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290(2):405-421, 2021.
David Bergman, Andre A. Cire, Willem-Jan van Hoeve, and John N. Hooker. Discrete optimization with decision diagrams. INFORMS Journal on Computing, 28(1):47-66, 2016.
Pierre Bonami, Andrea Lodi, and Giulia Zarpellon. Learning a classification of mixed-integer quadratic programming problems. In International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pages 595-604. Springer, 2018.
Federico Campeotto, Alessandro Dal Palu, Agostino Dovier, Ferdinando Fioretto, and Enrico Pontelli. Exploring the use of GPUs in constraint solving. In Practical Aspects of Declarative Languages: 16th International Symposium, PADL 2014, San Diego, CA, USA, January 20-21, 2014. Proceedings 16, pages 152-167. Springer, 2014.
Quentin Cappart, David Bergman, Louis-Martin Rousseau, Isabeau Prémont-Schwarz, and Augustin Parjadis. Improving variable orderings of approximate decision diagrams using reinforcement learning. INFORMS Journal on Computing, 2022.
Quentin Cappart, Didier Chételat, Elias B. Khalil, Andrea Lodi, Christopher Morris, and Petar Velickovic. Combinatorial optimization and reasoning with graph neural networks. Journal of Machine Learning Research, 24(130):1-61, 2023. URL: http://jmlr.org/papers/v24/21-0449.html.
Quentin Cappart, Thierry Moisan, Louis-Martin Rousseau, Isabeau Prémont-Schwarz, and Andre A. Cire. Combining reinforcement learning and constraint programming for combinatorial optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3677-3687, 2021.
Félix Chalumeau, Ilan Coulon, Quentin Cappart, and Louis-Martin Rousseau. Seapearl: A constraint programming solver guided by reinforcement learning. In International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pages 392-409. Springer, 2021.
Geoffrey Chu and Peter J. Stuckey. Learning value heuristics for constraint programming. In International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming for Combinatorial Optimization Problems 2015, pages 108-123. Springer, 2015.
Elizabeth D. Dolan and Jorge J. Moré. Benchmarking optimization software with performance profiles. Mathematical programming, 91(2):201-213, 2002.
Floris Doolaard and Neil Yorke-Smith. Online learning of variable ordering heuristics for constraint optimisation problems. Annals of Mathematics and Artificial Intelligence, pages 1-30, 2022.
Maxime Gasse, Didier Chételat, Nicola Ferroni, Laurent Charlin, and Andrea Lodi. Exact combinatorial optimization with graph convolutional neural networks. Advances in Neural Information Processing Systems, 32, 2019.
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315-323. JMLR Workshop and Conference Proceedings, 2011.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861-1870. PMLR, 2018.
William D. Harvey and Matthew L. Ginsberg. Limited discrepancy search. In Proceedings of the 14th international joint conference on Artificial intelligence, volume 1, pages 607-613, 1995.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.
Holger H. Hoos. Automated algorithm configuration and parameter tuning. In Autonomous search, pages 37-71. Springer, 2011.
Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and Chrisina Jayne. Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1-35, 2017.
Elias Khalil, Pierre Le Bodic, Le Song, George Nemhauser, and Bistra Dilkina. Learning to branch in mixed integer programming. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019. URL: https://openreview.net/forum?id=ByxBFsRqYm.
Markus Kruber, Marco E Lübbecke, and Axel Parmentier. Learning when to use a decomposition. In International conference on AI and OR techniques in constraint programming for combinatorial optimization problems, pages 202-210. Springer, 2017.
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436-444, 2015.
Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3. Atlanta, Georgia, USA, 2013.
Nina Mazyavkina, Sergey Sviridov, Sergei Ivanov, and Evgeny Burnaev. Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research, 134:105400, 2021.
Laurent Michel, Pierre Schaus, and Pascal van Hentenryck. MiniCP: a lightweight solver for constraint programming. Mathematical Programming Computation, 13:133-184, 2021.
Laurent Michel and Pascal van Hentenryck. Activity-based search for black-box constraint programming solvers. In International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming, pages 228-243. Springer, 2012.
Marvin Minsky. Steps toward artificial intelligence. Proceedings of the IRE, 49(1):8-30, 1961. URL: https://doi.org/10.1109/JRPROC.1961.287775.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
Mouad Morabit, Guy Desaulniers, and Andrea Lodi. Machine-learning-based column selection for column generation. Transportation Science, 55(4):815-831, 2021.
Nicholas Nethercote, Peter J. Stuckey, Ralph Becket, Sebastian Brand, Gregory J. Duck, and Guido Tack. Minizinc: Towards a standard CP modelling language. In International Conference on Principles and Practice of Constraint Programming, pages 529-543. Springer, 2007.
Jean-Yves Potvin, Danny Dubé, and Christian Robillard. A hybrid approach to vehicle routing using neural networks and genetic algorithms. Applied Intelligence, 6(3):241-252, 1996.
Philippe Refalo. Impact-based search strategies for constraint programming. In International Conference on Principles and Practice of Constraint Programming, pages 557-571. Springer, 2004.
Lara Scavuzzo, Feng Yang Chen, Didier Chételat, Maxime Gasse, Andrea Lodi, Neil Yorke-Smith, and Karen Aardal. Learning to branch with tree MDPs. arXiv preprint arXiv:2205.11107, 2022.
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889-1897, 2015.
Daniel Selsam and Nikolaj Bjørner. Guiding high-performance SAT solvers with unsat-core predictions. In International Conference on Theory and Applications of Satisfiability Testing, pages 336-353. Springer, 2019.
Wen Song, Zhiguang Cao, Jie Zhang, Chi Xu, and Andrew Lim. Learning variable ordering heuristics for solving constraint satisfaction problems. Engineering Applications of Artificial Intelligence, 109:104603, 2022.
Haoran Sun, Wenbo Chen, Hui Li, and Le Song. Improving learning to branch via reinforcement learning. In Learning Meets Combinatorial Algorithms at NeurIPS2020, 2020.
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning, 2017. URL: https://arxiv.org/abs/1706.05296.
Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
Fabio Tardivo and Agostino Dovier. Constraints propagation on GPU: A case study for alldifferent. In Proceedings of the 37th Italian Conference on Computational Logic, 2022.
Alexander Trott, Stephan Zheng, Caiming Xiong, and Richard Socher. Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems, 32, 2019.
Ronald van Driel, Emir Demirović, and Neil Yorke-Smith. Learning variable activity initialisation for lazy clause generation solvers. In Integration of Constraint Programming, Artificial Intelligence, and Operations Research: 18th International Conference, CPAIOR 2021, Vienna, Austria, July 5-8, 2021, Proceedings 18, pages 62-71. Springer, 2021.
Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double Q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
Wei Xia and Roland Yap. Learning robust search strategies using a bandit-based approach. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
Xin Yu, Thiago Serra, Srikumar Ramalingam, and Shandian Zhe. The combinatorial brain surgeon: Pruning weights that cancel one another in neural networks. In International Conference on Machine Learning, pages 25668-25683. PMLR, 2022.

Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver

Authors Tom Marty , Tristan François, Pierre Tessier, Louis Gautier, Louis-Martin Rousseau , Quentin Cappart

File

Document Identifiers

Author Details

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver

Authors Tom Marty , Tristan François, Pierre Tessier, Louis Gautier, Louis-Martin Rousseau , Quentin Cappart

File

Document Identifiers

Author Details

Funding

Acknowledgements

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References