Document Open Access Logo

TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees

Authors Sarah Christensen , Erin K. Molloy , Pranjal Vachaspati , Tandy Warnow



PDF
Thumbnail PDF

File

LIPIcs.WABI.2019.4.pdf
  • Filesize: 0.55 MB
  • 16 pages

Document Identifiers

Author Details

Sarah Christensen
  • University of Illinois at Urbana-Champaign, USA
Erin K. Molloy
  • University of Illinois at Urbana-Champaign, USA
Pranjal Vachaspati
  • University of Illinois at Urbana-Champaign, USA
Tandy Warnow
  • University of Illinois at Urbana-Champaign, USA

Acknowledgements

We thank Mike Steel for encouragement and the members of the Warnow lab for valuable feedback. This study was performed on the Illinois Campus Cluster and Blue Waters, a computing resource that is operated and financially supported by UIUC in conjunction with the National Center for Supercomputing Applications.

Cite AsGet BibTex

Sarah Christensen, Erin K. Molloy, Pranjal Vachaspati, and Tandy Warnow. TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 4:1-4:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.WABI.2019.4

Abstract

Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1.

Subject Classification

ACM Subject Classification
  • Applied computing → Molecular evolution
  • Applied computing → Population genetics
Keywords
  • Gene tree correction
  • horizontal gene transfer
  • incomplete lineage sorting

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail