Gene Tree Parsimony for Incomplete Gene Trees

Authors Md. Shamsuzzoha Bayzid, Tandy Warnow



PDF
Thumbnail PDF

File

LIPIcs.WABI.2017.2.pdf
  • Filesize: 489 kB
  • 13 pages

Document Identifiers

Author Details

Md. Shamsuzzoha Bayzid
Tandy Warnow

Cite AsGet BibTex

Md. Shamsuzzoha Bayzid and Tandy Warnow. Gene Tree Parsimony for Incomplete Gene Trees. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.WABI.2017.2

Abstract

Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/shamsbayzid/DynaDup.
Keywords
  • Gene duplication and loss
  • gene tree parsimony
  • deep coalescence

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. M. S. Bayzid, S. Mirarab, and T. Warnow. Inferring optimal species trees under gene duplication and loss. In Proc. of Pacific Symposium on Biocomputing (PSB), volume 18, pages 250-261, 2013. Google Scholar
  2. B. Boussau, G. J. Szöllősi, L. Duret, M. Gouy, E. Tannier, and V. Daubin. Genome-scale coestimation of species and gene trees. Genome research, 23(2):323-330, 2013. Google Scholar
  3. W. C. Chang, A. Wehe, P. Górecki, and O. Eulenstein. Exact solutions for classic gene tree parsimony problems. In Proc. of the 5th Int. Conf. on Bioinformatics and Computational Biology, pages 225-230, 2013. Google Scholar
  4. R. Chaudhary, M. S. Bansal, A. Wehe, D. Fernández-Baca, and O Eulenstein. iGTP: a software package for large-scale gene tree parsimony analysis. BMC Bioinf., pages 574-574, 2010. Google Scholar
  5. C. Chauve, J. P. Doyon, and N. El-Mabrouk. Gene family evolution by duplication, speciation, and loss. J. Comp. Biol., 15(8):1043-1062, 2008. Google Scholar
  6. J. P. Doyon and C. Chauve. Branch-and-bound approach for parsimonious inference of a species tree from a set of gene family trees. Adv. Exp. Med. Biol., 696:287-295, 2011. Google Scholar
  7. J. P. Doyon, V. Ranwez, V. Daubin, and V. Berry. Models, algorithms and programs for phylogeny reconciliation. Brieif. Bioinf., 12(5):392-400, 2011. Google Scholar
  8. H. N. Gabow and R. E. Tarjan. A linear-time algorithm for a special case of disjoint set union. In Proc. 15th ACM Symp. Theory of Comp. (STOC), pages 246-251, 1983. Google Scholar
  9. M. Goodman, J. Czelusniak, G. Moore, E. Romero-Herrera, and G. Matsuda. Fitting the gene lineage into its species lineage: a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool., 28:132-163, 1979. Google Scholar
  10. P. Górecki. Reconciliation problems for duplication, loss and horizontal gene transfer. In Proc. 8th Ann. Int. Conf. on Computational Molecular Biology, pages 316 - 325, 2004. Google Scholar
  11. P. Górecki and J. Tiuryn. DLS-trees: A model of evolutionary scenarios. Theor. Comput. Sci., 359(8):378-399, 2006. Google Scholar
  12. R. Guigo, I. Muchnik, and T. Smith. Reconstruction of ancient molecular phylogeny. Mol. Phylog. and Evol., 6(2):189-213, 1996. Google Scholar
  13. M. T. Hallett and J. Lagergren. New algorithms for the duplication-loss model. In Proc RECOMB, pages 138-146, 2000. Google Scholar
  14. B. Ma, M. Li, and L. Zhang. From gene trees to species trees. SIAM J. on Comput., 30(3):729-752, 2000. Google Scholar
  15. W. P. Maddison. Gene trees in species trees. Syst Biol, 46:523-536, 1997. Google Scholar
  16. B. Mirkin, I. Muchnik, and T. Smith. A biologically consistent model for comparing molecular phylogenies. J. Comput. Biol., 2(4):493-507, 1995. Google Scholar
  17. R. Page and M. Charleston. Reconciled trees and incongruent gene and species trees. In B. Mirkin, F. R. McMorris, F. S. Roberts, and A. Rzehtsky, editors, Mathematical hierarchies in biology, volume 37. American Math. Soc., 1997. Google Scholar
  18. R. D. M. Page. Maps between trees and cladistic analysis of historical associations among genes, organisms and areas. Systematic Biology, 43(1):58-77, 1994. Google Scholar
  19. R. D. M. Page. GeneTree: comparing gene and species phylogenies using reconciled trees. Bioinformatics, 14(9):819-820, 1998. URL: http://dx.doi.org/10.1093/bioinformatics/14.9.819.
  20. U. Stege. Gene trees and species trees: The gene-duplication problem is fixed-parameter tractable. In Proc. of the 6th Int. Workshop on Algorithms and Data Structures (WADS'99), pages 166-173, 1999. Google Scholar
  21. C. V. Than and L. Nakhleh. Species tree inference by minimizing deep coalescences. PLoS Comp. Biol., 5(9), 2009. Google Scholar
  22. C. V. Than, D. Ruths, and L. Nakhleh. PhyloNet: A software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinf., 9:322, 2008. Google Scholar
  23. B. Vernot, M. Stolzer, A. Goldman, and D. Durand. Reconciliation with non-binary species trees. J. Comp. Biol., 15(8):981-1006, 2008. Google Scholar
  24. A. Wehe, M. S. Bansal, J. G. Burleigh, and O. Eulenstein. Duptree: A program for large-scale phylogenetic analyses using gene tree parsimony. Amer. Jour. Bot., 24(13):1540-1541, 2008. Google Scholar
  25. Y. Yu, T. Warnow, and L. Nakhleh. Algorithms for MDC-based multi-locus phylogeny inference. In Proc. RECOMB, 2011. Google Scholar
  26. Y. Yu, T. Warnow, and L. Nakhleh. Algorithms for MDC-based multi-locus phylogeny inference: Beyond rooted binary gene trees on single alleles. J. Comp. Biol., 18(11):1543-1559, 2011. Google Scholar
  27. L. Zhang. On a Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. J. Comp. Biol., 4(2):177-188, 1997. Google Scholar
  28. L. Zhang. From gene trees to species trees II: Species tree inference by minimizing deep coalescence events. IEEE/ACM Trans. Comp. Biol. Bioinf., 8(9):1685-1691, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail