A Column Generation Approach for Pure Parsimony Haplotyping

Authors Veronica Dal Sasso, Luigi De Giovanni, Martine Labbé

Thumbnail PDF


  • Filesize: 458 kB
  • 11 pages

Document Identifiers

Author Details

Veronica Dal Sasso
Luigi De Giovanni
Martine Labbé

Cite AsGet BibTex

Veronica Dal Sasso, Luigi De Giovanni, and Martine Labbé. A Column Generation Approach for Pure Parsimony Haplotyping. In 5th Student Conference on Operational Research (SCOR 2016). Open Access Series in Informatics (OASIcs), Volume 50, pp. 5:1-5:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


The knowledge of nucleotides chains that compose the double DNA chain of an individual has a relevant role in detecting diseases and studying populations. However, determining experimentally the single nucleotides chains that, paired, form a certain portion of the DNA is expensive and time-consuming. Mathematical programming approaches have been proposed instead, e.g. formulating the Haplotype Inference by Pure Parsimony problem (HIPP). Abstractly, we are given a set of genotypes (strings over a ternary alphabet {0,1,2}) and we want to determine the smallest set of haplotypes (binary strings over the set {0,1}) so that each genotype can be "generated" by some pair of haplotypes, meaning that they are compatible with the genotype and can fully explain its structure. A polynomial-sized Integer Programming model was proposed by Catanzaro, Godi and Labbé (2010), which is highly efficient but hardly scalable to instances with a large number of genotypes. In order to deal with larger instances, we propose a new model involving an exponential number of variables to be solved via column generation, where variables are dynamically introduced into the model by iteratively solving a pricing problem. We compared different ways of solving the pricing problem, based on integer programming, smart enumeration and local search heuristic. The efficiency of the approach is improved by stabilization and by a heuristic to provide a good initial solution. Results show that, with respect to the linear relaxations of both the polynomial and exponential-size models, our approach yields a tighter formulation and outperforms in both efficiency and effectiveness the previous model for instances with a large number of genotypes.
  • computational biology
  • haplotyping
  • column generation
  • integer programming
  • combinatorial optimization


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. T. Achterberg. Scip: solving constraint integer programs. Math. Program. Comput. 1, 1:1.41, 2009. Google Scholar
  2. C. Barnhart, E. L. Johnson, G. L. Nemhauser, M. W. Savelsbergh, and P. H. Vance. Branch-and-price: Column generation for solving huge integer programs. Operations Research, 46(3):316-329, 1998. Google Scholar
  3. P. Bertolazzi, A. Godi, M. Labbé, and L. Tininini. Solving haplotyping inference parsimony problem using a new basic polynomial formulation, 2006. Google Scholar
  4. D. Brown and I. Harrower. A new integer programming formulation for the pure parsimony problem in haplotype analysis. Algorithms in Bioinformatics. Springer Berlin Heidelberg, pages 254-265, 2004. Google Scholar
  5. D. Brown and I. Harrower. Integer programming approaches to haplotype inference by pure parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 3(2):141-154, 2006. URL: http://dx.doi.org/10.1109/TCBB.2006.24.
  6. D. Catanzaro, A. Godi, and M. Labbé. A class representative model for pure parsimony haplotyping. INFORMS Journal on Computing 22.2, pages 195-209, 2010. Google Scholar
  7. A. Clark. Inference of haplotypes from pcr-amplified samples of diploid populations. Mol. Biol.Evol., 7:111-122, 1990. Google Scholar
  8. V. Dal Sasso, L. De Giovanni, and M. Labbé. Dantzig-wolfe decomposition of a quadratic ip model for haplotyping by pure parsimony. Technical report, http://www.math.unipd.it/∼ luigi/manuscripts/HIPP/TR-DM-HIPP-DWdecomposition.pdf, 2016. Google Scholar
  9. D. Gusfield. Haplotype inference by pure parsimony. Springer Lecture Notes in Computer Science No.2676, pages 144-155, 2003. Google Scholar
  10. B.V. Halldórson, B. Bafna, N. Edwards, R. Lippert, S. Yooseph, and S. Istrail. A survey of computational methods for SNPs and haplotype inference. Proc. DIMACS/RECOMB Satellite Workshop, pages 26-47, 2004. Google Scholar
  11. Y. T. Huang, K. M. Chao, and T. Chen. An approximation algorithm for haplotype inference by maximum parsimony. Journal of Computational Biology, 12, 10:1261-1274, 2005. Google Scholar
  12. IBM. Cplex optimizer, 1994. URL: http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/.
  13. G. Lancia, M.C. Pinotti, and R. Rizzi. Haplotyping populations by pure parsimony: Complexity of exact and approximation algorithms. INFORMS Journal on computing 16, 4:348-359, 2004. Google Scholar
  14. G. Lancia and P. Serafini. A set-covering approach with column generation for parsimony haplotyping. INFORMS Journal on Computing 21, 1:151-166, 2009. Google Scholar
  15. M. E. Lübbecke and J. Desrosiers. Selected topics in column generation. Operations Research, 53, 6:1007-1023, 2005. Google Scholar
  16. A. Pessoa, M. Poggi de Aragao, and R. Rodrigues. Algorithms over arc-indexed formulations for single and parallel machine scheduling problems, 2008. URL: http://www.optimization-online.org/DB_FILE/2008/06/2022.pdf.
  17. L. Tininini, P. Bertolazzi, A. Godi, and G. Lancia. Collhaps: a heuristic approach to haplotype inference by parsimony. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3):511-523, 2010. URL: http://dx.doi.org/10.1109/TCBB.2008.130.
  18. L. Wang and Y. Xu. Haplotype inference by maximum parsimony. Bioinformatics 19, 14:1773-1780, 2003. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail