DGEN: A Test Statistic for Detection of General Introgression Scenarios

Authors Ryan A. Leo Elworth, Chabrielle Allen, Travis Benedict, Peter Dulworth, Luay Nakhleh

Thumbnail PDF


  • Filesize: 0.53 MB
  • 13 pages

Document Identifiers

Author Details

Ryan A. Leo Elworth
  • Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, USA
Chabrielle Allen
  • Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, USA
Travis Benedict
  • Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, USA
Peter Dulworth
  • Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, USA
Luay Nakhleh
  • Department of Computer Science and Department of BioSciences, Rice University, 6100 Main Street, Houston, TX, USA

Cite AsGet BibTex

Ryan A. Leo Elworth, Chabrielle Allen, Travis Benedict, Peter Dulworth, and Luay Nakhleh. DGEN: A Test Statistic for Detection of General Introgression Scenarios. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 19:1-19:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


When two species hybridize, one outcome is the integration of genetic material from one species into the genome of the other, a process known as introgression. Detecting introgression in genomic data is a very important question in evolutionary biology. However, given that hybridization occurs between closely related species, a complicating factor for introgression detection is the presence of incomplete lineage sorting, or ILS. The D-statistic, famously referred to as the "ABBA-BABA" test, was proposed for introgression detection in the presence of ILS in data sets that consist of four genomes. More recently, D_FOIL - a set of statistics - was introduced to extend the D-statistic to data sets of five genomes. The major contribution of this paper is demonstrating that the invariants underlying both the D-statistic and D_FOIL can be derived automatically from the probability mass functions of gene tree topologies under the null species tree model and alternative phylogenetic network model. Computational requirements aside, this automatic derivation provides a way to generalize these statistics to data sets of any size and with any scenarios of introgression. We demonstrate the accuracy of the general statistic, which we call D_GEN, on simulated data sets with varying rates of introgression, and apply it to an empirical data set of mosquito genomes. We have implemented D_GEN and made it available, both as a graphical user interface tool and as a command-line tool, as part of the freely available, open-source software package ALPHA (https://github.com/chilleo/ALPHA).

Subject Classification

ACM Subject Classification
  • Applied computing → Genomics
  • Applied computing → Computational biology
  • Introgression
  • genealogies
  • phylogenetic networks


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. M.L. Arnold. Natural Hybridization and Evolution. Oxford U. Press, 1997. Google Scholar
  2. N.H. Barton. The role of hybridization in evolution. Molecular Ecology, 10(3):551-568, 2001. Google Scholar
  3. P.D. Blischak, J. Chifman, A.D. Wolfe, and L.S. Kubatko. HyDe: a Python package for genome-scale hybridization detection. Systematic Biology, 2018. Google Scholar
  4. J. H. Degnan and L. A. Salter. Gene tree distributions under the coalescent process. Evolution, 59:24-37, 2005. Google Scholar
  5. J.H. Degnan and N.A. Rosenberg. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology and Evolution, 24(6):332-340, 2009. Google Scholar
  6. Eric Y. Durand, Nick Patterson, David Reich, and Montgomery Slatkin. Testing for ancient admixture between closely related populations. Molecular Biology and Evolution, 28(8):2239-2252, 2011. Google Scholar
  7. RA Leo Elworth, Chabrielle Allen, Travis Benedict, Peter Dulworth, and Luay Nakhleh. ALPHA: A toolkit for automated local phylogenomic analyses. Bioinformatics, 1:3, 2018. Google Scholar
  8. Michael C Fontaine, James B Pease, Aaron Steele, Robert M Waterhouse, Daniel E Neafsey, Igor V Sharakhov, Xiaofang Jiang, Andrew B Hall, Flaminia Catteruccia, Evdoxia Kakani, Sara N. Mitchell, Yi-Chieh Wu, Hilary A. Smith, R. Rebecca Love, Mara K. Lawniczak, Michel A. Slotman, Scott J. Emrich, Matthew W. Hahn, and Nora J. Besansky. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science, 347(6217):1258524, 2015. Google Scholar
  9. Richard E. Green, Johannes Krause, Adrian W. Briggs, Tomislav Maricic, Udo Stenzel, Martin Kircher, Nick Patterson, Heng Li, Weiwei Zhai, Markus Hsi-Yang Fritz, Nancy F. Hansen, Eric Y. Durand, Anna-Sapfo Malaspinas, Jeffrey D. Jensen, Tomas Marques-Bonet, Can Alkan, Kay Prafer, Matthias Meyer, Hern A. Burbano, Jeffrey M. Good, Rigo Schultz, Ayinuer Aximu-Petri, Anne Butthof, Barbara Hober, Barbara Hoffner, Madlen Siegemund, Antje Weihmann, Chad Nusbaum, Eric S. Lander, Carsten Russ, Nathaniel Novod, Jason Affourtit, Michael Egholm, Christine Verna, Pavao Rudan, Dejana Brajkovic, Oeljko Kucan, Ivan Guic, Vladimir B. Doronichev, Liubov V. Golovanova, Carles Lalueza-Fox, Marco de la Rasilla, Javier Fortea, Antonio Rosas, Ralf W. Schmitz, Philip L. F. Johnson, Evan E. Eichler, Daniel Falush, Ewan Birney, James C. Mullikin, Montgomery Slatkin, Rasmus Nielsen, Janet Kelso, Michael Lachmann, David Reich, and Svante Paabo. A draft sequence of the Neandertal genome. Science, 328(5979):710-722, 2010. Google Scholar
  10. R. R. Hudson. Testing the constant-rate neutral allele model with protein sequence data. Evolution, 37:203-217, 1983. Google Scholar
  11. Richard R Hudson. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18(2):337-338, 2002. Google Scholar
  12. T. Jukes and C. Cantor. Evolution of protein molecules. In H.N. Munro, editor, Mammalian Protein Metabolism, pages 21-132. Academic Press, NY, 1969. Google Scholar
  13. Laura Kubatko and Julia Chifman. An invariants-based method for efficient identification of hybrid species from large-scale genomic data. bioRxiv, page 034348, 2015. Google Scholar
  14. J. Mallet. Hybridization as an invasion of the genome. TREE, 20(5):229-237, 2005. Google Scholar
  15. J. Mallet. Hybrid speciation. Nature, 446:279-283, 2007. Google Scholar
  16. J. Mallet, N. Besansky, and M.W. Hahn. How reticulated are species? BioEssays, 38(2):140-149, 2016. Google Scholar
  17. P. Pamilo and M. Nei. Relationship between gene trees and species trees. Mol. Bio. Evol., 5:568-583, 1998. Google Scholar
  18. James B Pease and Matthew W Hahn. Detection and polarization of introgression in a five-taxon phylogeny. Systematic biology, 64(4):651-662, 2015. Google Scholar
  19. Fernando Racimo, Sriram Sankararaman, Rasmus Nielsen, and Emilia Huerta-Sánchez. Evidence for archaic adaptive introgression in humans. Nature Reviews Genetics, 16(6):359-371, 2015. Google Scholar
  20. Andrew Rambaut and Nicholas C Grass. Seq-gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Computer Applications in the Biosciences, 13(3):235-238, 1997. Google Scholar
  21. L. H. Rieseberg. Hybrid origins of plant species. Annual Review of Ecology and Systematics, 28:359-389, 1997. Google Scholar
  22. Loren H Rieseberg, Olivier Raymond, David M Rosenthal, Zhao Lai, Kevin Livingstone, Takuya Nakazato, Jennifer L Durphy, Andrea E Schwarzbach, Lisa A Donovan, and Christian Lexer. Major ecological transitions in wild sunflowers facilitated by hybridization. Science, 301(5637):1211-1216, 2003. Google Scholar
  23. Claudia Solís-Lemus and Cécile Ané. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet, 12(3):e1005896, 2016. Google Scholar
  24. N. Takahata. Gene genealogy in three related populations: Consistency probability between gene and population trees. Genetics, 122:957-966, 1989. Google Scholar
  25. Dingqiao Wen and Luay Nakhleh. Co-estimating reticulate phylogenies and gene trees from multi-locus sequence data. Systematic Biology, 67(3):439-457, 2018. Google Scholar
  26. Dingqiao Wen, Yun Yu, Matthew W Hahn, and Luay Nakhleh. Reticulate evolutionary history and extensive introgression in mosquito species revealed by phylogenetic network analysis. Molecular Ecology, 25(11):2361-2372, 2016. Google Scholar
  27. Dingqiao Wen, Yun Yu, and Luay Nakhleh. Bayesian inference of reticulate phylogenies under the multispecies network coalescent. PLoS Genetics, 12(5):e1006006, 2016. Google Scholar
  28. Y. Yu, J.H. Degnan, and L. Nakhleh. The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genetics, 8:e1002660, 2012. Google Scholar
  29. Y. Yu and L. Nakhleh. A maximum pseudo-likelihood approach for phylogenetic networks. BMC Genomics, 16:S10, 2015. Google Scholar
  30. Yun Yu, James H Degnan, and Luay Nakhleh. The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet, 8(4):e1002660, 2012. Google Scholar
  31. Yun Yu, Jianrong Dong, Kevin J Liu, and Luay Nakhleh. Maximum likelihood inference of reticulate evolutionary histories. Proceedings of the National Academy of Sciences, 111(46):16448-16453, 2014. Google Scholar
  32. Chi Zhang, Huw A Ogilvie, Alexei J Drummond, and Tanja Stadler. Bayesian inference of species networks from multilocus sequence data. Molecular biology and evolution, 35(2):504-517, 2018. Google Scholar
  33. Jiafan Zhu and Luay Nakhleh. Inference of species phylogenies from bi-allelic markers using pseudo-likelihood. Bioinformatics, 2018. (to appear). Google Scholar
  34. Jiafan Zhu, Dingqiao Wen, Yun Yu, Heidi M Meudt, and Luay Nakhleh. Bayesian inference of phylogenetic networks from bi-allelic genetic markers. PLoS Computational Biology, 14(1):e1005932, 2018. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail