License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.CPM.2021.25
URN: urn:nbn:de:0030-drops-139769
URL: https://drops.dagstuhl.de/opus/volltexte/2021/13976/
Go to the corresponding LIPIcs Volume Portal


Yao, Keegan ; Bansal, Mukul S.

Optimal Completion and Comparison of Incomplete Phylogenetic Trees Under Robinson-Foulds Distance

pdf-format:
LIPIcs-CPM-2021-25.pdf (1 MB)


Abstract

The comparison of phylogenetic trees is a fundamental task in phylogenetics and evolutionary biology. In many cases, these comparisons involve trees inferred on the same set of leaves, and many distance measures exist to facilitate such comparisons. However, several applications in phylogenetics require the comparison of trees that have non-identical leaf sets. The traditional approach for handling such comparisons is to first restrict the two trees being compared to just their common leaf set. An alternative, conceptually superior approach that has shown promise is to first complete the trees by adding missing leaves so that the completed trees have identical leaf sets. This alternative approach requires the computation of optimal completions of the two trees that minimize the distance between them. However, no polynomial-time algorithms currently exist for this optimal completion problem under any standard phylogenetic distance measure.
In this work, we provide the first polynomial-time algorithms for the above problem under the widely used Robinson-Foulds (RF) distance measure. This hitherto unsolved problem is referred to as the RF(+) problem. We (i) show that a recently proposed linear-time algorithm for a restricted version of the RF(+) problem is a 2-approximation for the RF(+) problem, and (ii) provide an exact O(nkĀ²)-time algorithm for the RF(+) problem, where n is the total number of distinct leaf labels in the two trees being compared and k, bounded above by n, depends on the topologies and leaf set overlap of the two trees. Our results hold for both rooted and unrooted binary trees.
We implemented our exact algorithm and applied it to several biological datasets. Our results show that completion-based RF distance can lead to very different inferences regarding phylogenetic similarity compared to traditional RF distance. An open-source implementation of our algorithms is freely available from https://compbio.engr.uconn.edu/software/RF_plus.

BibTeX - Entry

@InProceedings{yao_et_al:LIPIcs.CPM.2021.25,
  author =	{Yao, Keegan and Bansal, Mukul S.},
  title =	{{Optimal Completion and Comparison of Incomplete Phylogenetic Trees Under Robinson-Foulds Distance}},
  booktitle =	{32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
  pages =	{25:1--25:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-186-3},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{191},
  editor =	{Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/opus/volltexte/2021/13976},
  URN =		{urn:nbn:de:0030-drops-139769},
  doi =		{10.4230/LIPIcs.CPM.2021.25},
  annote =	{Keywords: Phylogenetic tree comparison, Robinson-Foulds Distance, Optimal tree completion, Algorithms, Dynamic programming}
}

Keywords: Phylogenetic tree comparison, Robinson-Foulds Distance, Optimal tree completion, Algorithms, Dynamic programming
Collection: 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)
Issue Date: 2021
Date of publication: 30.06.2021
Supplementary Material: Software (Source Code): https://compbio.engr.uconn.edu/software/RF_plus


DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI