Document Open Access Logo

Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver

Authors Tom Marty , Tristan François, Pierre Tessier, Louis Gautier, Louis-Martin Rousseau , Quentin Cappart

Thumbnail PDF


  • Filesize: 1.74 MB
  • 19 pages

Document Identifiers

Author Details

Tom Marty
  • Polytechnique Montréal, Montreal, Canada
  • Ecole Polytechnique, Palaiseau, France
Tristan François
  • Ecole Polytechnique, Palaiseau, France
Pierre Tessier
  • Ecole Polytechnique, Palaiseau, France
Louis Gautier
  • Ecole Polytechnique, Palaiseau, France
Louis-Martin Rousseau
  • Polytechnique Montréal, Montreal, Canada
Quentin Cappart
  • Polytechnique Montréal, Montreal, Canada


We thank CIRRELT for providing essential computing resources. We extend our gratitude to all the people who contributed to the related open-source project: Léo Boisvert, Tom Sander, Ziad El Assal, Malik Attalah, and Marco Novaes.

Cite AsGet BibTex

Tom Marty, Tristan François, Pierre Tessier, Louis Gautier, Louis-Martin Rousseau, and Quentin Cappart. Learning a Generic Value-Selection Heuristic Inside a Constraint Programming Solver. In 29th International Conference on Principles and Practice of Constraint Programming (CP 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 280, pp. 25:1-25:19, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2023)


Constraint programming is known for being an efficient approach to solving combinatorial problems. Important design choices in a solver are the branching heuristics, designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. Although several generic variable-selection heuristics are available in the literature, the options for value-selection heuristics are more scarce. We propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network. Experiments on graph coloring, maximum independent set, and maximum cut problems show that this framework competes with the well-known impact-based and activity-based search heuristics and can find solutions close to optimality without requiring a large number of backtracks.

Subject Classification

ACM Subject Classification
  • Computing methodologies → Artificial intelligence
  • Computing methodologies → Machine learning
  • Branching heuristic
  • Deep reinforcement learning


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Joshua Achiam. Spinning up as a deep RL researcher, October 2018. URL:
  2. Réka Albert and Albert-László Barabási. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002. Google Scholar
  3. Gilles Audemard, Christophe Lecoutre, and Emmanuel Lonca. Proceedings of the 2022 XCSP3 competition. arXiv preprint arXiv:2209.00917, 2022. Google Scholar
  4. Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016. Google Scholar
  5. Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290(2):405-421, 2021. Google Scholar
  6. David Bergman, Andre A. Cire, Willem-Jan van Hoeve, and John N. Hooker. Discrete optimization with decision diagrams. INFORMS Journal on Computing, 28(1):47-66, 2016. Google Scholar
  7. Pierre Bonami, Andrea Lodi, and Giulia Zarpellon. Learning a classification of mixed-integer quadratic programming problems. In International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pages 595-604. Springer, 2018. Google Scholar
  8. Federico Campeotto, Alessandro Dal Palu, Agostino Dovier, Ferdinando Fioretto, and Enrico Pontelli. Exploring the use of GPUs in constraint solving. In Practical Aspects of Declarative Languages: 16th International Symposium, PADL 2014, San Diego, CA, USA, January 20-21, 2014. Proceedings 16, pages 152-167. Springer, 2014. Google Scholar
  9. Quentin Cappart, David Bergman, Louis-Martin Rousseau, Isabeau Prémont-Schwarz, and Augustin Parjadis. Improving variable orderings of approximate decision diagrams using reinforcement learning. INFORMS Journal on Computing, 2022. Google Scholar
  10. Quentin Cappart, Didier Chételat, Elias B. Khalil, Andrea Lodi, Christopher Morris, and Petar Velickovic. Combinatorial optimization and reasoning with graph neural networks. Journal of Machine Learning Research, 24(130):1-61, 2023. URL:
  11. Quentin Cappart, Thierry Moisan, Louis-Martin Rousseau, Isabeau Prémont-Schwarz, and Andre A. Cire. Combining reinforcement learning and constraint programming for combinatorial optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3677-3687, 2021. Google Scholar
  12. Félix Chalumeau, Ilan Coulon, Quentin Cappart, and Louis-Martin Rousseau. Seapearl: A constraint programming solver guided by reinforcement learning. In International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pages 392-409. Springer, 2021. Google Scholar
  13. Geoffrey Chu and Peter J. Stuckey. Learning value heuristics for constraint programming. In International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming for Combinatorial Optimization Problems 2015, pages 108-123. Springer, 2015. Google Scholar
  14. Elizabeth D. Dolan and Jorge J. Moré. Benchmarking optimization software with performance profiles. Mathematical programming, 91(2):201-213, 2002. Google Scholar
  15. Floris Doolaard and Neil Yorke-Smith. Online learning of variable ordering heuristics for constraint optimisation problems. Annals of Mathematics and Artificial Intelligence, pages 1-30, 2022. Google Scholar
  16. Maxime Gasse, Didier Chételat, Nicola Ferroni, Laurent Charlin, and Andrea Lodi. Exact combinatorial optimization with graph convolutional neural networks. Advances in Neural Information Processing Systems, 32, 2019. Google Scholar
  17. Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315-323. JMLR Workshop and Conference Proceedings, 2011. Google Scholar
  18. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861-1870. PMLR, 2018. Google Scholar
  19. William D. Harvey and Matthew L. Ginsberg. Limited discrepancy search. In Proceedings of the 14th international joint conference on Artificial intelligence, volume 1, pages 607-613, 1995. Google Scholar
  20. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016. Google Scholar
  21. Holger H. Hoos. Automated algorithm configuration and parameter tuning. In Autonomous search, pages 37-71. Springer, 2011. Google Scholar
  22. Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and Chrisina Jayne. Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1-35, 2017. Google Scholar
  23. Elias Khalil, Pierre Le Bodic, Le Song, George Nemhauser, and Bistra Dilkina. Learning to branch in mixed integer programming. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016. Google Scholar
  24. Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. Google Scholar
  25. Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019. URL:
  26. Markus Kruber, Marco E Lübbecke, and Axel Parmentier. Learning when to use a decomposition. In International conference on AI and OR techniques in constraint programming for combinatorial optimization problems, pages 202-210. Springer, 2017. Google Scholar
  27. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436-444, 2015. Google Scholar
  28. Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3. Atlanta, Georgia, USA, 2013. Google Scholar
  29. Nina Mazyavkina, Sergey Sviridov, Sergei Ivanov, and Evgeny Burnaev. Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research, 134:105400, 2021. Google Scholar
  30. Laurent Michel, Pierre Schaus, and Pascal van Hentenryck. MiniCP: a lightweight solver for constraint programming. Mathematical Programming Computation, 13:133-184, 2021. Google Scholar
  31. Laurent Michel and Pascal van Hentenryck. Activity-based search for black-box constraint programming solvers. In International Conference on Integration of Artificial Intelligence and Operations Research Techniques in Constraint Programming, pages 228-243. Springer, 2012. Google Scholar
  32. Marvin Minsky. Steps toward artificial intelligence. Proceedings of the IRE, 49(1):8-30, 1961. URL:
  33. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013. Google Scholar
  34. Mouad Morabit, Guy Desaulniers, and Andrea Lodi. Machine-learning-based column selection for column generation. Transportation Science, 55(4):815-831, 2021. Google Scholar
  35. Nicholas Nethercote, Peter J. Stuckey, Ralph Becket, Sebastian Brand, Gregory J. Duck, and Guido Tack. Minizinc: Towards a standard CP modelling language. In International Conference on Principles and Practice of Constraint Programming, pages 529-543. Springer, 2007. Google Scholar
  36. Jean-Yves Potvin, Danny Dubé, and Christian Robillard. A hybrid approach to vehicle routing using neural networks and genetic algorithms. Applied Intelligence, 6(3):241-252, 1996. Google Scholar
  37. Philippe Refalo. Impact-based search strategies for constraint programming. In International Conference on Principles and Practice of Constraint Programming, pages 557-571. Springer, 2004. Google Scholar
  38. Lara Scavuzzo, Feng Yang Chen, Didier Chételat, Maxime Gasse, Andrea Lodi, Neil Yorke-Smith, and Karen Aardal. Learning to branch with tree MDPs. arXiv preprint arXiv:2205.11107, 2022. Google Scholar
  39. Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015. Google Scholar
  40. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889-1897, 2015. Google Scholar
  41. Daniel Selsam and Nikolaj Bjørner. Guiding high-performance SAT solvers with unsat-core predictions. In International Conference on Theory and Applications of Satisfiability Testing, pages 336-353. Springer, 2019. Google Scholar
  42. Wen Song, Zhiguang Cao, Jie Zhang, Chi Xu, and Andrew Lim. Learning variable ordering heuristics for solving constraint satisfaction problems. Engineering Applications of Artificial Intelligence, 109:104603, 2022. Google Scholar
  43. Haoran Sun, Wenbo Chen, Hui Li, and Le Song. Improving learning to branch via reinforcement learning. In Learning Meets Combinatorial Algorithms at NeurIPS2020, 2020. Google Scholar
  44. Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning, 2017. URL:
  45. Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018. Google Scholar
  46. Fabio Tardivo and Agostino Dovier. Constraints propagation on GPU: A case study for alldifferent. In Proceedings of the 37th Italian Conference on Computational Logic, 2022. Google Scholar
  47. Alexander Trott, Stephan Zheng, Caiming Xiong, and Richard Socher. Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems, 32, 2019. Google Scholar
  48. Ronald van Driel, Emir Demirović, and Neil Yorke-Smith. Learning variable activity initialisation for lazy clause generation solvers. In Integration of Constraint Programming, Artificial Intelligence, and Operations Research: 18th International Conference, CPAIOR 2021, Vienna, Austria, July 5-8, 2021, Proceedings 18, pages 62-71. Springer, 2021. Google Scholar
  49. Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double Q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016. Google Scholar
  50. Wei Xia and Roland Yap. Learning robust search strategies using a bandit-based approach. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018. Google Scholar
  51. Xin Yu, Thiago Serra, Srikumar Ramalingam, and Shandian Zhe. The combinatorial brain surgeon: Pruning weights that cancel one another in neural networks. In International Conference on Machine Learning, pages 25668-25683. PMLR, 2022. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail