The Path-Label Reconciliation (PLR) Dissimilarity Measure for Gene Trees

Authors Alitzel López Sánchez , José Antonio Ramírez-Rafael , Alejandro Flores-Lamas , Maribel Hernández-Rosales , Manuel Lafond



PDF
Thumbnail PDF

File

LIPIcs.WABI.2024.20.pdf
  • Filesize: 3.42 MB
  • 21 pages

Document Identifiers

Author Details

Alitzel López Sánchez
  • Computer Science Department, Université de Sherbrooke, Canada
José Antonio Ramírez-Rafael
  • Center for Research and Advanced Studies of the National Polytechnic Institute, Irapuato, Gto., Mexico
  • Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Germany
  • Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
Alejandro Flores-Lamas
  • Center for Research and Advanced Studies of the National Polytechnic Institute, Irapuato, Gto., Mexico
Maribel Hernández-Rosales
  • Center for Research and Advanced Studies of the National Polytechnic Institute, Irapuato, Gto., Mexico
Manuel Lafond
  • Computer Science Department, Université de Sherbrooke, Canada

Cite AsGet BibTex

Alitzel López Sánchez, José Antonio Ramírez-Rafael, Alejandro Flores-Lamas, Maribel Hernández-Rosales, and Manuel Lafond. The Path-Label Reconciliation (PLR) Dissimilarity Measure for Gene Trees. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 20:1-20:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.WABI.2024.20

Abstract

In this study, we investigate the problem of comparing gene trees reconciled with the same species tree using a novel semi-metric, called the Path-Label Reconciliation (PLR) dissimilarity measure. This approach not only quantifies differences in the topology of reconciled gene trees, but also considers discrepancies in predicted ancestral gene-species maps and speciation/duplication events, offering a refinement of existing metrics such as Robinson-Foulds (RF) and their labeled extensions LRF and ELRF. A tunable parameter α also allows users to adjust the balance between its species map and event labeling components. We show that PLR can be computed in linear time and that it is a semi-metric. We also discuss the diameters of reconciled gene tree measures, which are important in practice for normalization, and provide initial bounds on PLR, LRF, and ELRF. To validate PLR, we simulate reconciliations and perform comparisons with LRF and ELRF. The results show that PLR provides a more evenly distributed range of distances, making it less susceptible to overestimating differences in the presence of small topological changes, while at the same time being computationally efficient. Our findings suggest that the theoretical diameter is rarely reached in practice. The PLR measure advances phylogenetic reconciliation by combining theoretical rigor with practical applicability. Future research will refine its mathematical properties, explore its performance on different tree types, and integrate it with existing bioinformatics tools for large-scale evolutionary analyses. The open source code is available at: https://pypi.org/project/parle/.

Subject Classification

ACM Subject Classification
  • Applied computing → Molecular evolution
Keywords
  • Reconciliation
  • gene trees
  • species trees
  • evolutionary scenarios

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Örjan Åkerborg, Bengt Sennblad, Lars Arvestad, and Jens Lagergren. Simultaneous bayesian gene tree reconstruction and reconciliation analysis. Proceedings of the National Academy of Sciences, 106(14):5714-5719, 2009. Google Scholar
  2. Yoann Anselmetti, Nadia El-Mabrouk, Manuel Lafond, and Aïda Ouangraoua. Gene tree and species tree reconciliation with endosymbiotic gene transfer. Bioinformatics, 37(Supplement_1):i120-i132, 2021. Google Scholar
  3. Lars Arvestad, Ann-Charlotte Berglund, Jens Lagergren, and Bengt Sennblad. Bayesian gene/species tree reconciliation and orthology analysis using mcmc. Bioinformatics-Oxford, 19(1):7-15, 2003. Google Scholar
  4. Mukul S Bansal, Eric J Alm, and Manolis Kellis. Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics, 28(12):i283-i291, 2012. Google Scholar
  5. Mukul S Bansal and Oliver Eulenstein. The multiple gene duplication problem revisited. Bioinformatics, 24(13):i132-i138, 2008. Google Scholar
  6. Mukul S Bansal, Manolis Kellis, Misagh Kordi, and Soumya Kundu. Ranger-dtl 2.0: rigorous reconstruction of gene-family evolution by duplication, transfer and loss. Bioinformatics, 34(18):3214-3216, 2018. Google Scholar
  7. Bérénice Batut, David P Parsons, Stephan Fischer, Guillaume Beslon, and Carole Knibbe. In silico experimental evolution: a tool to test evolutionary scenarios. In BMC bioinformatics, volume 14, pages 1-11. Springer, 2013. Google Scholar
  8. Michael A Bender and Martin Farach-Colton. The lca problem revisited. In LATIN 2000: Theoretical Informatics: 4th Latin American Symposium, Punta del Este, Uruguay, April 10-14, 2000 Proceedings 4, pages 88-94. Springer, 2000. Google Scholar
  9. Paola Bonizzoni, Gianluca Della Vedova, and Riccardo Dondi. Reconciling a gene tree to a species tree under the duplication cost model. Theoretical computer science, 347(1-2):36-53, 2005. Google Scholar
  10. Bastien Boussau and Celine Scornavacca. Reconciling gene trees with species trees. Phylogenetics in the genomic era, pages 3-2, 2020. Google Scholar
  11. Samuel Briand, Christophe Dessimoz, Nadia El-Mabrouk, Manuel Lafond, and Gabriela Lobinska. A generalized robinson-foulds distance for labeled trees. BMC Genomics, 21(S10), November 2020. URL: https://doi.org/10.1186/s12864-020-07011-0.
  12. Samuel Briand, Christophe Dessimoz, Nadia El-Mabrouk, and Yannis Nevers. A linear time solution to the labeled robinson-foulds distance problem. Systematic Biology, 71(6):1391-1403, 2022. Google Scholar
  13. J Gordon Burleigh, Mukul S Bansal, Andre Wehe, and Oliver Eulenstein. Locating multiple gene duplications through reconciled trees. In Research in Computational Molecular Biology: 12th Annual International Conference, RECOMB 2008, Singapore, March 30-April 2, 2008. Proceedings 12, pages 273-284. Springer, 2008. Google Scholar
  14. Yao-ban Chan, Vincent Ranwez, and Céline Scornavacca. Inferring incomplete lineage sorting, duplications, transfers and losses with reconciliations. Journal of theoretical biology, 432:1-13, 2017. Google Scholar
  15. Chris Conow, Daniel Fielder, Yaniv Ovadia, and Ran Libeskind-Hadas. Jane: a new tool for the cophylogeny reconstruction problem. Algorithms for Molecular Biology, 5:1-10, 2010. Google Scholar
  16. Adrián A Davín, Théo Tricou, Eric Tannier, Damien M de Vienne, and Gergely J Szöllősi. Zombi: a phylogenetic simulator of trees, genomes and sequences that accounts for dead linages. Bioinformatics, 36(4):1286-1288, 2020. Google Scholar
  17. Mattéo Delabre, Nadia El-Mabrouk, Katharina T Huber, Manuel Lafond, Vincent Moulton, Emmanuel Noutahi, and Miguel Sautie Castellanos. Reconstructing the history of syntenies through super-reconciliation. In Comparative Genomics: 16th International Conference, RECOMB-CG 2018, Magog-Orford, QC, Canada, October 9-12, 2018, Proceedings 16, pages 179-195. Springer, 2018. Google Scholar
  18. Riccardo Dondi, Manuel Lafond, and Celine Scornavacca. Reconciling multiple genes trees via segmental duplications and losses. Algorithms for Molecular Biology, 14:1-19, 2019. Google Scholar
  19. Jean-Philippe Doyon, Celine Scornavacca, K Yu Gorbunov, Gergely J Szöllősi, Vincent Ranwez, and Vincent Berry. An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. In Comparative Genomics: International Workshop, RECOMB-CG 2010, Ottawa, Canada, October 9-11, 2010. Proceedings 8, pages 93-108. Springer, 2010. Google Scholar
  20. Dannie Durand, Bjarni V Halldórsson, and Benjamin Vernot. A hybrid micro-macroevolutionary approach to gene tree reconstruction. In Research in Computational Molecular Biology: 9th Annual International Conference, RECOMB 2005, Cambridge, MA, USA, May 14-18, 2005. Proceedings 9, pages 250-264. Springer, 2005. Google Scholar
  21. Manuela Geiß, Marcos E González Laffitte, Alitzel López Sánchez, Dulce I Valdivia, Marc Hellmuth, Maribel Hernández Rosales, and Peter F Stadler. Best match graphs and reconciliation of gene trees with species trees. Journal of mathematical biology, 80(5):1459-1495, 2020. Google Scholar
  22. Manuela Geiß, Marcos E. González Laffitte, Alitzel López Sánchez, Dulce I. Valdivia, Marc Hellmuth, Maribel Hernández Rosales, and Peter F. Stadler. Best match graphs and reconciliation of gene trees with species trees. Journal of Mathematical Biology, 80(5):1459-1495, January 2020. URL: https://doi.org/10.1007/s00285-020-01469-y.
  23. Pablo A Goloboff, Joan S Arias, and Claudia A Szumik. Comparing tree shapes: beyond symmetry. Zool. Scr., 46(5):637-648, September 2017. Google Scholar
  24. Morris Goodman, John Czelusniak, G William Moore, Alejo E Romero-Herrera, and Genji Matsuda. Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Biology, 28(2):132-163, 1979. Google Scholar
  25. Pawel Górecki. Reconciliation problems for duplication, loss and horizontal gene transfer. In Proceedings of the eighth annual international conference on Research in computational molecular biology, pages 316-325, 2004. Google Scholar
  26. Paweł Górecki, Natalia Rutecka, Agnieszka Mykowiecka, and Jarosław Paszek. Unifying duplication episode clustering and gene-species mapping inference. Algorithms for Molecular Biology, 19(1):1-20, 2024. Google Scholar
  27. Paweł Górecki and Jerzy Tiuryn. Dls-trees: a model of evolutionary scenarios. Theoretical computer science, 359(1-3):378-399, 2006. Google Scholar
  28. Damir Hasić and Eric Tannier. Gene tree species tree reconciliation with gene conversion. Journal of mathematical biology, 78(6):1981-2014, 2019. Google Scholar
  29. Marc Hellmuth, Maribel Hernandez-Rosales, Katharina T. Huber, Vincent Moulton, Peter F. Stadler, and Nicolas Wieseke. Orthology relations, symbolic ultrametrics, and cographs. Journal of Mathematical Biology, 66(1–2):399-420, March 2012. URL: https://doi.org/10.1007/s00285-012-0525-x.
  30. Maribel Hernandez-Rosales, Marc Hellmuth, Nicolas Wieseke, Katharina T Huber, Vincent Moulton, and Peter F Stadler. From event-labeled gene trees to species trees. In BMC bioinformatics, volume 13, pages 1-11. Springer, 2012. Google Scholar
  31. Katharina T. Huber, Vincent Moulton, Marie-France Sagot, and Blerina Sinaimeri. Geometric medians in reconciliation spaces of phylogenetic trees. Information Processing Letters, 136:96-101, August 2018. URL: https://doi.org/10.1016/j.ipl.2018.04.001.
  32. Edwin Jacox, Cedric Chauve, Gergely J Szöllősi, Yann Ponty, and Celine Scornavacca. eccetera: comprehensive gene tree-species tree reconciliation using parsimony. Bioinformatics, 32(13):2056-2058, 2016. Google Scholar
  33. Edwin Jacox, Mathias Weller, Eric Tannier, and Celine Scornavacca. Resolution and reconciliation of non-binary gene trees with transfers, duplications and losses. Bioinformatics, 33(7):980-987, 2017. Google Scholar
  34. Stephanie Keller-Schmidt and Konstantin Klemm. A model of macroevolution as a branching process based on innovations. Advances in Complex Systems, 15(07):1250043, 2012. Google Scholar
  35. Misagh Kordi and Mukul S Bansal. Exact algorithms for duplication-transfer-loss reconciliation with non-binary gene trees. In Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pages 297-306, 2016. Google Scholar
  36. Misagh Kordi, Soumya Kundu, and Mukul S Bansal. On inferring additive and replacing horizontal gene transfers through phylogenetic reconciliation. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 514-523, 2019. Google Scholar
  37. Esaie Kuitche, Manuel Lafond, and Aïda Ouangraoua. Reconstructing protein and gene phylogenies using reconciliation and soft-clustering. Journal of bioinformatics and computational biology, 15(06):1740007, 2017. Google Scholar
  38. Soumya Kundu and Mukul S Bansal. SaGePhy: an improved phylogenetic simulation framework for gene and subgene evolution. Bioinformatics, 35(18):3496-3498, February 2019. URL: https://doi.org/10.1093/bioinformatics/btz081.
  39. Manuel Lafond, Krister M Swenson, and Nadia El-Mabrouk. An optimal reconciliation algorithm for gene trees with polytomies. In Algorithms in Bioinformatics: 12th International Workshop, WABI 2012, Ljubljana, Slovenia, September 10-12, 2012. Proceedings 12, pages 106-122. Springer, 2012. Google Scholar
  40. Manuel Lafond, Krister M Swenson, and Nadia El-Mabrouk. Error detection and correction of gene trees. Models and algorithms for genome evolution, pages 261-285, 2013. Google Scholar
  41. Bret R Larget, Satish K Kotha, Colin N Dewey, and Cécile Ané. Bucky: gene tree/species tree reconciliation with bayesian concordance analysis. Bioinformatics, 26(22):2910-2911, 2010. Google Scholar
  42. Lei Li and Mukul S Bansal. Simultaneous multi-domain-multi-gene reconciliation under the domain-gene-species reconciliation model. In Bioinformatics Research and Applications: 15th International Symposium, ISBRA 2019, Barcelona, Spain, June 3-6, 2019, Proceedings 15, pages 73-86. Springer, 2019. Google Scholar
  43. Qiuyi Li, Celine Scornavacca, Nicolas Galtier, and Yao-Ban Chan. The multilocus multispecies coalescent: a flexible new model of gene family evolution. Systematic Biology, 70(4):822-837, 2021. Google Scholar
  44. Yu Lin, Vaibhav Rajan, and Bernard ME Moret. A metric for phylogenetic trees based on matching. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4):1014-1022, 2011. Google Scholar
  45. Jingyi Liu, Ross Mawhorter, Nuo Liu, Santi Santichaivekin, Eliot Bush, and Ran Libeskind-Hadas. Maximum parsimony reconciliation in the dtlor model. BMC bioinformatics, 22:1-22, 2021. Google Scholar
  46. Alitzel López Sánchez, José Antonio Ramírez-Rafael, Alejandro Flores-Lamas, Maribel Hernández-Rosales, and Manuel Lafond. PARLE: Path Analysis, Reconciliation, and Label Evaluation. Software, version 0.0.2. (visited on 2024-08-19). URL: https://pypi.org/project/parle/.
  47. V Makarenkov and B Leclerc. Comparison of additive trees using circular orders. J. Comput. Biol., 7(5):731-744, 2000. Google Scholar
  48. Diego Mallo, Leonardo de Oliveira Martins, and David Posada. Simphy: phylogenomic simulation of gene, locus, and species trees. Systematic biology, 65(2):334-344, 2016. Google Scholar
  49. Tamara Munzner, François Guimbretière, Serdar Tasiran, Li Zhang, and Yunhong Zhou. TreeJuxtaposer. In ACM SIGGRAPH 2003 Papers, New York, NY, USA, July 2003. ACM. Google Scholar
  50. Nikolai Nøjgaard, Manuela Geiß, Daniel Merkle, Peter F Stadler, Nicolas Wieseke, and Marc Hellmuth. Time-consistent reconciliation maps and forbidden time travel. Algorithms for Molecular Biology, 13:1-17, 2018. Google Scholar
  51. Nikolai Nøjgaard, Manuela Geiß, Daniel Merkle, Peter F. Stadler, Nicolas Wieseke, and Marc Hellmuth. Time-consistent reconciliation maps and forbidden time travel. Algorithms for Molecular Biology, 13(1), February 2018. URL: https://doi.org/10.1186/s13015-018-0121-8.
  52. Roderic DM Page and JA Cotton. Vertebrate phylogenomics: reconciled trees and gene duplications. In Biocomputing 2002, pages 536-547. World Scientific, 2001. Google Scholar
  53. Jarosław Paszek and Paweł Górecki. Efficient algorithms for genomic duplication models. IEEE/ACM transactions on computational biology and bioinformatics, 15(5):1515-1524, 2017. Google Scholar
  54. Pere Puigbò, Santiago Garcia-Vallvé, and James O McInerney. TOPD/FMTS: a new software to compare phylogenetic trees. Bioinformatics, 23(12):1556-1558, June 2007. Google Scholar
  55. José Antonio Ramírez-Rafael, Annachiara Korchmaros, Katia Aviña-Padilla, Alitzel López Sánchez, Andrea Arlette España-Tinajero, Marc Hellmuth, Peter F. Stadler, and Maribel Hernández-Rosales. Revolutionh-tl: Reconstruction of evolutionary histories tool. In Celine Scornavacca and Maribel Hernández-Rosales, editors, Comparative Genomics, pages 89-109, Cham, 2024. Springer Nature Switzerland. Google Scholar
  56. Matthew D Rasmussen and Manolis Kellis. Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome research, 22(4):755-765, 2012. Google Scholar
  57. Santi Santichaivekin, Qing Yang, Jingyi Liu, Ross Mawhorter, Justin Jiang, Trenton Wesley, Yi-Chieh Wu, and Ran Libeskind-Hadas. empress: a systematic cophylogeny reconciliation tool. Bioinformatics, 37(16):2481-2482, 2021. Google Scholar
  58. H M Savage. The shape of evolution: systematic tree topology. Biol. J. Linn. Soc. Lond., 20(3):225-244, November 1983. Google Scholar
  59. David Schaller, Marc Hellmuth, and Peter F Stadler. Asymmetree: a flexible python package for the simulation of complex gene family histories. Software, 1(3):276-298, 2022. Google Scholar
  60. David Schaller, Marc Hellmuth, and Peter F Stadler. AsymmeTree: A flexible python package for the simulation of complex gene family histories. Software, 1(3):276-298, August 2022. Google Scholar
  61. David Schaller, Manuel Lafond, Peter F Stadler, Nicolas Wieseke, and Marc Hellmuth. Indirect identification of horizontal gene transfer. Journal of mathematical biology, 83(1):10, 2021. Google Scholar
  62. Celine Scornavacca, Joan Carles Pons Mayol, and Gabriel Cardona. Fast algorithm for the reconciliation of gene trees and lgt networks. Journal of theoretical biology, 418:129-137, 2017. Google Scholar
  63. Maureen Stolzer, Han Lai, Minli Xu, Deepa Sathaye, Benjamin Vernot, and Dannie Durand. Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees. Bioinformatics, 28(18):i409-i415, 2012. Google Scholar
  64. Benjamin Vernot, Maureen Stolzer, Aiton Goldman, and Dannie Durand. Reconciliation with non-binary species trees. Journal of computational biology, 15(8):981-1006, 2008. Google Scholar
  65. Sanket Wagle, Alexey Markin, Paweł Górecki, Tavis K. Anderson, and Oliver Eulenstein. Asymmetric cluster-based measures for comparative phylogenetics. Journal of Computational Biology, 31(4):312-327, April 2024. URL: https://doi.org/10.1089/cmb.2023.0338.
  66. Samson Weiner and Mukul S Bansal. Improved duplication-transfer-loss reconciliation with extinct and unsampled lineages. Algorithms, 14(8):231, 2021. Google Scholar
  67. Yi-Chieh Wu, Matthew D Rasmussen, Mukul S Bansal, and Manolis Kellis. Most parsimonious reconciliation in the presence of gene duplication, loss, and deep coalescence using labeled coalescent trees. Genome research, 24(3):475-486, 2014. Google Scholar
  68. Louxin Zhang. On a mirkin-muchnik-smith conjecture for comparing molecular phylogenies. Journal of Computational Biology, 4(2):177-187, 1997. Google Scholar
  69. Louxin Zhang. From gene trees to species trees ii: Species tree inference by minimizing deep coalescence events. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(6):1685-1691, 2011. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail