Sparsification of Phylogenetic Covariance Matrices of k-Regular Trees

Authors Sean Svihla, Manuel E. Lladser



PDF
Thumbnail PDF

File

LIPIcs.AofA.2024.4.pdf
  • Filesize: 0.83 MB
  • 17 pages

Document Identifiers

Author Details

Sean Svihla
  • Department of Applied Mathematics, University of Colorado, Boulder, CO, USA
Manuel E. Lladser
  • Department of Applied Mathematics, University of Colorado, Boulder, CO, USA

Acknowledgements

We are thankful to the reviewers for their comments and insightful suggestions.

Cite AsGet BibTex

Sean Svihla and Manuel E. Lladser. Sparsification of Phylogenetic Covariance Matrices of k-Regular Trees. In 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 302, pp. 4:1-4:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.AofA.2024.4

Abstract

Consider a tree T = (V,E) with root ∘ and an edge length function 𝓁:E → ℝ_+. The phylogenetic covariance matrix of T is the matrix C with rows and columns indexed by L, the leaf set of T, with entries C(i,j): = ∑_{e ∈ [i∧ j,o]}𝓁(e), for each i,j ∈ L. Recent work [Gorman & Lladser 2023] has shown that the phylogenetic covariance matrix of a large but random binary tree T is significantly sparsified, with overwhelmingly high probability, under a change-of-basis to the so-called Haar-like wavelets of T. Notably, this finding enables manipulating the spectrum of covariance matrices of large binary trees without the necessity to store them in computer memory but instead performing two post-order traversals of the tree [Gorman & Lladser 2023]. Building on the methods of the aforesaid paper, this manuscript further advances their sparsification result to encompass the broader class of k-regular trees, for any given k ≥ 2. This extension is achieved by refining existing asymptotic formulas for the mean and variance of the internal path length of random k-regular trees, utilizing hypergeometric function properties and identities.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Trees
  • Mathematics of computing → Generating functions
  • Mathematics of computing → Random graphs
Keywords
  • cophenetic matrix
  • Haar-like wavelets
  • hierarchical data
  • hypergeometric functions
  • metagenomics
  • phylogenetic covariance matrix
  • sparsification
  • ultrametric matrix

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. K. M. Abadir. An introduction to hypergeometric functions for economists. Econometric Reviews, 18(3):287-330, 1999. Google Scholar
  2. D. Aldous. Probability distributions on cladograms. In D. Aldous and R. Pemantle, editors, Random Discrete Structures, pages 1-18, New York, NY, 1996. Springer New York. Google Scholar
  3. D. Aldous and B. Pittel. The critical beta-splitting random tree: Heights and related results, 2023. URL: https://arxiv.org/abs/2302.05066.
  4. D. J. Aldous. The critical beta-splitting random tree II: Overview and open problems, 2023. URL: https://arxiv.org/abs/2303.02529.
  5. L. L. Cavalli-Sforza and A. W. Edwards. Phylogenetic analysis: models and estimation procedures. Evolution, 21(3):550, 1967. URL: https://doi.org/10.2307/2406616.
  6. C. Dellacherie, S. Martinez, and J. San Martín. Inverse M-Matrices and Ultrametric Matrices, volume 2118 of Lecture Notes in Mathematics. Springer, 2014. Google Scholar
  7. M. Drmota. Random Trees: An Interplay between Combinatorics and Probability. Springer-Verlag/Wein, 2009. Google Scholar
  8. R. J. Evans and D. Stanton. Asymptotic formulas for zero-balanced hypergeometric series. SIAM J. Math. Anal., 1984. URL: https://doi.org/10.1137/0515078.
  9. P. Flajolet and R. Sedegwick. Analytic Combinatorics. Cambridge University Press, 2009. URL: http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=9780521898065.
  10. J. Fukuyama, P. J. McMurdie, L. Dethlefsen, D. A. Relman, and S. Holmes. Comparisons of distance methods for combining covariates and abundances in microbiome studies. Biocomputing, pages 213-224, 2012. URL: http://psb.stanford.edu/psb-online/proceedings/psb12/fukuyama.pdf.
  11. Matan Gavish, Boaz Nadler, and Ronald R. Coifman. Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML'10, pages 367-374, Madison, WI, USA, 2010. Omnipress. Google Scholar
  12. I. Gessel. How can I verify this family of values for hypergeometric functions? MathOverflow, september 12 2023. URL: https://mathoverflow.net/q/454420.
  13. E. Gorman and M. E. Lladser. Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance. PLoS Comput Biol, 2024 (to appear). URL: https://www.biorxiv.org/content/10.1101/2023.09.27.559681v1.
  14. E. Gorman and M. E. Lladser. Sparsification of large ultrametric matrices: insights into the microbial Tree of Life. Proc. R. Soc. A, 479:20220847, 2023. URL: https://doi.org/10.1098/rspa.2022.0847.
  15. L. J. Harmon. Phylogenetic Comparative Methods. CreateSpace Independent Publishing Platform, 2019. Google Scholar
  16. E. Hille. Analytic function theory. Vol. 1. Introduction to Higher Mathematics. Ginn and Company, 1959. Google Scholar
  17. Svante Janson. Simply generated trees, conditioned Galton–Watson trees, random allocations and condensation. Probability Surveys, 9:103-252, 2012. Google Scholar
  18. S. Martinez, G. Michon, and J. San Martín. Inverse of strictly ultrametric matrices are of Stieltjes type. SIAM J. Matrix Anal. Appl., 15(1):98-106, 1994. URL: https://doi.org/10.1137/s0895479891217011.
  19. A. Meir and J. W. Moon. On the altitude of nodes in random trees. Canadian Journal of Mathematics, 30, 1978. Google Scholar
  20. R. Nabben and R. S. Varga. A linear algebra proof that the inverse of a strictly ultrametric matrix is a strictly diagonally dominant stieltjes matrix. SIAM J. Matrix Anal. Appl., 15(1):107-113, 1994. URL: https://doi.org/10.1137/s0895479892228237.
  21. S. Pavoine, A.-B. Dufour, and D. Chessel. From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis. Journal of Theoretical Biology, 228(4):523-537, 2004. URL: https://doi.org/10.1016/j.jtbi.2004.02.014.
  22. Sean Svihla. Sparsification of covariance matrices of k-regular trees. Master’s thesis, The University of Colorado, 2024. Google Scholar
  23. E. W. Weisstein. Hypergeometric function. https://mathworld.wolfram.com/HypergeometricFunction.html. Accessed: September 2023.
  24. Q. Zhu, U. Mai, and W. Pfeiffer et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains bacteria and archaea. Nature Communications, 10:5477, 2019. Google Scholar