Leveraging Constraints Plus Dynamic Programming for the Large Dollo Parsimony Problem

Authors Junyan Dai , Tobias Rubel , Yunheng Han , Erin K. Molloy

Thumbnail PDF


  • Filesize: 0.99 MB
  • 23 pages

Document Identifiers

Author Details

Junyan Dai
  • Department of Computer Science, University of Maryland, College Park, MD, USA
Tobias Rubel
  • Department of Computer Science, University of Maryland, College Park, MD, USA
Yunheng Han
  • Department of Computer Science, University of Maryland, College Park, MD, USA
Erin K. Molloy
  • Department of Computer Science, University of Maryland, College Park, MD, USA


We thank the anonymous reviewers for constructive feedback.

Cite AsGet BibTex

Junyan Dai, Tobias Rubel, Yunheng Han, and Erin K. Molloy. Leveraging Constraints Plus Dynamic Programming for the Large Dollo Parsimony Problem. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 5:1-5:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This technique has yet to be used for classic parsimony problems where the input are binary characters, sometimes with missing values. Here, we introduce the clade-constrained character parsimony problem and present an algorithm that solves this problem in polynomial time for the Dollo criterion score. Dollo parsimony, which requires traits/mutations to be gained at most once but allows them to be lost any number of times, is widely used for tumor phylogenetics as well as species phylogenetics, for example analyses of low-homoplasy retroelement insertions across the vertebrate tree of life. Thus, we implement our algorithm in a software package, called Dollo-CDP, and evaluate its utility in the context of species phylogenetics using both simulated and real data sets. Our results show that Dollo-CDP can improve upon heuristic search from a single starting tree, often recovering a better scoring tree. Moreover, Dollo-CDP scales to data sets with much larger numbers of taxa than branch-and-bound while still having an optimality guarantee, albeit a more restricted one. Lastly, we show that our algorithm for Dollo parsimony can easily be adapted to Camin-Sokal parsimony but not Fitch parsimony.

Subject Classification

ACM Subject Classification
  • Applied computing → Molecular evolution
  • phylogenetics
  • parsimony
  • Dollo
  • Camin-Sokal
  • dynamic programming
  • constraints


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Md. Shamsuzzoha Bayzid and Tandy Warnow. Gene Tree Parsimony for Incomplete Gene Trees. In Russell Schwartz and Knut Reinert, editors, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017), volume 88 of Leibniz International Proceedings in Informatics (LIPIcs), pages 2:1-2:13, Dagstuhl, Germany, 2017. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. URL: https://doi.org/10.4230/LIPIcs.WABI.2017.2.
  2. Md. Shamsuzzoha Bayzid and Tandy Warnow. Gene tree parsimony for incomplete gene trees: addressing true biological loss. Algorithms for Molecular Biology, 13(1), 2018. URL: https://doi.org/10.1186/s13015-017-0120-1.
  3. Paola Bonizzoni, Simone Ciccolella, Gianluca Della Vedova, and Mauricio Soto. Beyond perfect phylogeny: Multisample phylogeny reconstruction via ilp. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics, ACM-BCB '17, pages 1-10, New York, NY, USA, 2017. Association for Computing Machinery. URL: https://doi.org/10.1145/3107411.3107441.
  4. Paola Bonizzoni, Simone Ciccolella, Gianluca Della Vedova, and Mauricio Soto. Does relaxing the infinite sites assumption give better tumor phylogenies? an ilp-based comparative approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(5):1410-1423, 2019. URL: https://doi.org/10.1109/TCBB.2018.2865729.
  5. Remco Bouckaert, Mareike Fischer, and Kristina Wicke. Combinatorial perspectives on dollo-k characters in phylogenetics. Advances in Applied Mathematics, 131:102252, 2021. URL: https://doi.org/10.1016/j.aam.2021.102252.
  6. David Bryant and Mike Steel. Constructing optimal trees from quartets. Journal of Algorithms, 38(1):237-259, 2001. URL: https://doi.org/10.1006/jagm.2000.1133.
  7. Simone Ciccolella, Mauricio Soto Gomez, Murray D. Patterson, Gianluca Della Vedova, Iman Hajirasouliha, and Paola Bonizzoni. gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data. BMC Bioinformatics, 21(Suppl 1):313, 2020. URL: https://doi.org/10.1186/s12859-020-03736-7.
  8. Alison Cloutier, Timothy B. Sackton, Phil Grayson, Michele Clamp, Allan J. Baker, and Scott V. Edwards. Whole-genome analyses resolve the phylogeny of flightless birds (Palaeognathae) in the presence of an empirical anomaly zone. Systematic Biology, 68(6):937-955, 2019. URL: https://doi.org/10.1093/sysbio/syz019.
  9. William H.E. Day, David S. Johnson, and David Sankoff. The computational complexity of inferring rooted phylogenies by parsimony. Mathematical Biosciences, 81(1):33-42, 1986. URL: https://doi.org/10.1016/0025-5564(86)90161-6.
  10. Payam Dibaeinia, Shayan Tabe-Bordbar, and Tandy Warnow. FASTRAL: improving scalability of phylogenomic analysis. Bioinformatics, 37(16):2317-2324, 2021. URL: https://doi.org/10.1093/bioinformatics/btab093.
  11. Liliya Doronina, Gennady Churakov, Andrej Kuritzin, Jingjing Shi, Robert Baertsch, Hiram Clawson, and Jürgen Schmitz. Speciation network in laurasiatheria: retrophylogenomic signals. Genome Research, 27:997-1003, 2017. URL: https://doi.org/10.1101/gr.210948.116.
  12. Liliya Doronina, Graham M. Hughes, Diana Moreno-Santillan, Colleen Lawless, Tadhg Lonergan, Louise Ryan, David Jebb, Bogdan M. Kirilenko, Jennifer M. Korstian, Liliana M. Dávalos, Sonja C. Vernes, Eugene W. Myers, Emma C. Teeling, Michael Hiller, Lars S. Jermiin, Jürgen Schmitz, Mark S. Springer, and David A. Ray. Contradictory phylogenetic signals in the laurasiatheria anomaly zone. Genes, 13(5), 2022. URL: https://doi.org/10.3390/genes13050766.
  13. Mohammed El-Kebir. SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error. Bioinformatics, 34(17):i671-i679, 2018. URL: https://doi.org/10.1093/bioinformatics/bty589.
  14. Joseph Felsenstein. Parsimony in systematics: Biological and statistical issues. Annual Review of Ecology and Systematics, 14:313-333, 1983. URL: http://www.jstor.org/stable/2096976.
  15. Joseph Felsenstein. Inferring Phylogenies. Sinauer Associates, Inc., Sunderland, Massachusetts, 2 edition, 2004. URL: https://doi.org/10.1007/BF01734359.
  16. Joseph Felsenstein. Phylip (phylogeny inference package), 2005. Accessed on XX. URL: https://evolution.genetics.washington.edu/phylip.html.
  17. Ronald A. Fisher. On the dominance ratio. Proceedings of the Royal Society of Edinburgh, 42:321-341, 1923. URL: https://doi.org/10.1017/S0370164600023993.
  18. Walter M. Fitch. Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology. Systematic Biology, 20(4):406-416, 1971. URL: https://doi.org/10.1093/sysbio/20.4.406.
  19. Ronald L. Graham and Les R. Foulds. Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computational time. Mathematical Biosciences, 60(2):133-142, 1982. URL: https://doi.org/10.1016/0025-5564(82)90125-0.
  20. Michael T. Hallett and Jens Lagergren. New algorithms for the duplication-loss model. In Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, RECOMB '00, pages 138-146, New York, NY, USA, 2000. Association for Computing Machinery. URL: https://doi.org/10.1145/332306.332359.
  21. Richard R. Hudson. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics, 18(2):337-338, February 2002. URL: https://doi.org/10.1093/bioinformatics/18.2.337.
  22. Roy N. Platt II, Michael W. Vandewege, and David A. Ray. Mammalian transposable elements and their impacts on genome evolution. Chromosome Research, 26:25-43, 2018. URL: https://doi.org/10.1007/s10577-017-9570-z.
  23. Roy N. Platt II, Yuhua Zhang, David J. Witherspoon, Jinchuan Xing, Alexander Suh, Megan S. Keith, Lynn B. Jorde, Richard D. Stevens, and David A. Ray. Targeted capture of phylogenetically informative ves sine insertions in genus myotis. Genome Biology and Evolution, 7(6):1664-1675, 2015. URL: https://doi.org/10.1093/gbe/evv099.
  24. Mazharul Islam, Kowshika Sarker, Trisha Das, Rezwana Reaz, and Md. Shamsuzzoha Bayzid. STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency. BMC Genomics, 21(1):136, 2020. URL: https://doi.org/10.1186/s12864-020-6519-y.
  25. Jennifer M. Korstian, Nicole S. Paulat, Roy N. Platt II, Richard D. Stevens, and David A. Ray. Sine-based phylogenomics reveal extensive introgression and incomplete lineage sorting in myotis. Genes, 13(3):399, 2022. URL: https://doi.org/10.3390/genes13030399.
  26. Fritjof Lammers, Moritz Blumer, Cornelia Rücklé, and Maria A. Nilsson. Retrophylogenomics in rorquals indicate large ancestral population sizes and a rapid radiation. Mobile DNA, 10:5, 2019. URL: https://doi.org/10.1186/s13100-018-0143-2.
  27. Liang Liu, Lili Yu, and Scott V. Edwards. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evolutionary Biology, 10:302, 2010. URL: https://doi.org/10.1186/1471-2148-10-302.
  28. Diego Mallo, Leonardo De Oliveira Martins, and David Posada. SimPhy : Phylogenomic simulation of gene, locus, and species trees. Systematic Biology, 65(2):334-344, November 2015. URL: https://doi.org/10.1093/sysbio/syv082.
  29. Siavash Mirarab, Rezwana Reaz, Md. Shamsuzzoha Bayzid, Théo Zimmermann, Michelle S. Swenson, and Tandy Warnow. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics, 30(17):i541-i548, 2014. URL: https://doi.org/10.1093/bioinformatics/btu462.
  30. Siavash Mirarab and Tandy Warnow. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31(12):i44-i52, 2015. URL: https://doi.org/10.1093/bioinformatics/btv234.
  31. Erin K. Molloy, John Gatesy, and Mark S Springer. Theoretical and practical considerations when using retroelement insertions to estimate species trees in the anomaly zone. Systematic Biology, 71(3):721-740, 2021. URL: https://doi.org/10.1093/sysbio/syab086.
  32. Erin K. Molloy and Tandy Warnow. FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics, 36(Supplement_1):i57-i65, July 2020. URL: https://doi.org/10.1093/bioinformatics/btaa444.
  33. Abdel-Halim Salem, David A. Ray amd Jinchuan Xing, Pauline A. Callinan, Jeremy S. Myers, Dale J. Hedges, Randall K. Garber, David J. Witherspoon, Lynn B. Jorde, and Mark A. Batzer. Alu elements and hominid phylogenetics. Proceedings of the National Academy of Sciences of the United States of America, 100(22):12787-12791, 2003. URL: https://doi.org/10.1073/pnas.2133766100.
  34. Andrew M. Shedlock, Michael C. Milinkovitch, and Norihiro Okada. SINE evolution, missing data, and the origin of whales. Systematic Biology, 49:808-817, 2000. Google Scholar
  35. Mark S Springer, Erin K Molloy, Daniel B Sloan, Mark P Simmons, and John Gatesy. ILS-aware analysis of low-homoplasy retroelement insertions: Inference of species trees and introgression using quartets. Journal of Heredity, 111(2):147-168, 2019. URL: https://doi.org/10.1093/jhered/esz076.
  36. David L. Swofford. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts, 2003. Google Scholar
  37. Pranjal Vachaspati and Tandy Warnow. FastRFS: fast and accurate robinson-foulds supertrees using constrained exact optimization. Bioinformatics, 33(5):631-639, September 2016. URL: https://doi.org/10.1093/bioinformatics/btw600.
  38. Pranjal Vachaspati and Tandy Warnow. SIESTA: enhancing searches for optimal supertrees and species trees. BMC Genomics, 19(Suppl 5):252, 2018. URL: https://doi.org/10.1186/s12864-018-4621-1.
  39. Tandy Warnow. Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation. Cambridge University Press, Cambridge, United Kingdom, 2017. Google Scholar
  40. Sewall Wright. Evolution in mendelian populations. Genetics, 16(2):97-159, 1931. URL: https://doi.org/10.1093/genetics/16.2.97.
  41. John Yin, Chao Zhang, and Siavash Mirarab. ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization. Bioinformatics, 35(20):3961-3969, 2019. URL: https://doi.org/10.1093/bioinformatics/btz211.
  42. Yun Yu, Tandy Warnow, and Luay Nakhleh. Algorithms for MDC-based multi-locus phylogeny inference: Beyond rooted binary gene trees on single alleles. Journal of Computational Biology, 18(11):1543-1559, 2011. URL: https://doi.org/10.1089/cmb.2011.0174.
  43. Chao Zhang, Maryam Rabiee, Erfan Sayyari, and Siavash Mirarab. ASTRAL-III: Polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics, 19(6):153, 2018. URL: https://doi.org/10.1186/s12859-018-2129-y.
  44. Chao Zhang, Celine Scornavacca, Erin K. Molloy, and Siavash Mirarab. ASTRAL-Pro: Quartet-based species-tree inference despite paralogy. Molecular Biology and Evolution, 37(11):3292-3307, 2020. URL: https://doi.org/10.1093/molbev/msaa139.
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail