3 Search Results for "Han, Yunheng"


Document
Dolphyin: A Combinatorial Algorithm for Identifying 1-Dollo Phylogenies in Cancer

Authors: Daniel W. Feng and Mohammed El-Kebir

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
Several recent cancer phylogeny inference methods have used the k-Dollo evolutionary model for single-nucleotide variants. Specifically, in this problem one is given an m × n binary matrix B and seeks a rooted tree T with m leaves that correspond to the m rows of B, and each node of T is labeled by a binary state for each of the n characters subject to the restriction that each character is gained at most once (0-to-1 transition) and subsequently lost at most k times (1-to-0 transitions). The 1-Dollo variant, also known as the persistent perfect phylogeny where one is restricted to at most k = 1 losses per character, has been studied extensively, but its hardness remains an open question. Here, we prove that the 1-Dollo Linear Phylogeny (1DLP) problem, where we additionally require the resulting 1-Dollo phylogeny T to be linear, is equivalent to verifying whether the input matrix B adheres to the Consecutive Ones Property (C1P), which can be solved in polynomial time. Due to the equivalence, several known NP-hardness results for relevant variants of C1P carry over to 1DLP, including the minimization of false negatives (0-to-1 modifications to the input matrix B) or the allowance of 2 gains and 2 losses. We furthermore show how we can recursively decompose any, not necessarily linear, 1-Dollo phylogeny T into several 1-Dollo linear phylogenies, connected by matching branching points. We extend this characterization to matrices B that admit 1-Dollo phylogenies, giving necessary and sufficient conditions for the existence of a novel decomposition of B into several submatrices and corresponding branching points. This decomposition forms the basis of Dolphyin, a new exponential-time algorithm for inferring 1-Dollo phylogenies that efficiently leverages the determination of linear 1-Dollo phylogenies as a subroutine. Dolphyin can also be applied to input matrices B with false negatives. We demonstrate that Dolphyin is runtime-competitive with a previous integer linear programming based algorithm SPhyR on simulated datasets. We additionally analyze simulated datasets with false negative errors and find that in the median case, Dolphyin infers 1-Dollo phylogenies with inferred error rates at or below the ground truth rate. Finally, we apply Dolphyin to 99 acute myeloid leukemia single-cell sequencing datasets, finding that the majority of the cancers can be explained by 1-Dollo phylogenies with false negative error rates in line with the used sequencing technology. Availability. Dolphyin is available at: https://github.com/elkebir-group/Dolphyin.

Cite as

Daniel W. Feng and Mohammed El-Kebir. Dolphyin: A Combinatorial Algorithm for Identifying 1-Dollo Phylogenies in Cancer. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 9:1-9:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{feng_et_al:LIPIcs.WABI.2025.9,
  author =	{Feng, Daniel W. and El-Kebir, Mohammed},
  title =	{{Dolphyin: A Combinatorial Algorithm for Identifying 1-Dollo Phylogenies in Cancer}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{9:1--9:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.9},
  URN =		{urn:nbn:de:0030-drops-239356},
  doi =		{10.4230/LIPIcs.WABI.2025.9},
  annote =	{Keywords: Intra-tumor heterogeneity, persistent perfect phylogeny, consecutive ones property, combinatorics}
}
Document
Leveraging Constraints Plus Dynamic Programming for the Large Dollo Parsimony Problem

Authors: Junyan Dai, Tobias Rubel, Yunheng Han, and Erin K. Molloy

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)


Abstract
The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This technique has yet to be used for classic parsimony problems where the input are binary characters, sometimes with missing values. Here, we introduce the clade-constrained character parsimony problem and present an algorithm that solves this problem in polynomial time for the Dollo criterion score. Dollo parsimony, which requires traits/mutations to be gained at most once but allows them to be lost any number of times, is widely used for tumor phylogenetics as well as species phylogenetics, for example analyses of low-homoplasy retroelement insertions across the vertebrate tree of life. Thus, we implement our algorithm in a software package, called Dollo-CDP, and evaluate its utility in the context of species phylogenetics using both simulated and real data sets. Our results show that Dollo-CDP can improve upon heuristic search from a single starting tree, often recovering a better scoring tree. Moreover, Dollo-CDP scales to data sets with much larger numbers of taxa than branch-and-bound while still having an optimality guarantee, albeit a more restricted one. Lastly, we show that our algorithm for Dollo parsimony can easily be adapted to Camin-Sokal parsimony but not Fitch parsimony.

Cite as

Junyan Dai, Tobias Rubel, Yunheng Han, and Erin K. Molloy. Leveraging Constraints Plus Dynamic Programming for the Large Dollo Parsimony Problem. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 5:1-5:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{dai_et_al:LIPIcs.WABI.2023.5,
  author =	{Dai, Junyan and Rubel, Tobias and Han, Yunheng and Molloy, Erin K.},
  title =	{{Leveraging Constraints Plus Dynamic Programming for the Large Dollo Parsimony Problem}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{5:1--5:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.5},
  URN =		{urn:nbn:de:0030-drops-186312},
  doi =		{10.4230/LIPIcs.WABI.2023.5},
  annote =	{Keywords: phylogenetics, parsimony, Dollo, Camin-Sokal, dynamic programming, constraints}
}
Document
Abstract
Quartets Enable Statistically Consistent Estimation of Cell Lineage Trees Under an Unbiased Error and Missingness Model (Abstract)

Authors: Yunheng Han and Erin K. Molloy

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)


Abstract
Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells [Lim et al., 2020]. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally [Jahn et al., 2016; Schwartz and Schäffer, 2017]. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) and triplets (three-leaf, rooted phylogenetic trees), in light of these barriers. Quartets and triplets have long been used as the building blocks for reconstructing the evolutionary history of species [Wilkinson et al., 2005]. The reason triplet-based methods (e.g., MP-EST [Liu et al., 2010]) and quartet-based methods (e.g., ASTRAL [Mirarab et al., 2014]) have garnered such success in species phylogenetics is their good statistical properties under the Multi-Species Coalescent (MSC) model [Pamilo and Nei, 1988; Rannala and Yang, 2003]; see Allman et al. (2011) and Degnan (2006) for identifiability results under the MSC model for quartets and triplets, respectively. Inspired by these efforts, we study the utility of quartets and triplets for estimating cell lineage trees under a popular tumor phylogenetics model [Jahn et al., 2016; Ross and Markowetz, 2016; Wu, 2019; Kizilkale et al., 2022] with two phases. First, mutations arise on a (highly unresolved) cell lineage tree according to the infinite sites model, and second, errors (false positives and false negatives) and missing values are introduced to the resulting mutation data in an unbiased fashion, mimicking data produced by single-cell sequencing protocols. This infinite sites plus unbiased error and missingness (IS+UEM) model generates mutations (rather than gene genealogies like the MSC model). However, a quartet (with leaves bijectively labeled by four cells) is implied by a mutation being present in two cells and absent from two cells [Molloy et al., 2021; Springer et al., 2019]; similarly, a triplet (on three cells) is implied by a mutation being present in two cells and absent from one cell. Our main result is that under the IS+UEM, the most probable quartet identifies the unrooted model cell lineage tree on four cells, with a mild assumption: the probability of false negatives and the probability of false positives must not sum to one. Somewhat surprisingly, our identifiability result for quartets does not extend to triplets, with more restrictive assumptions being required for identifiability. These results motivate seeking an unrooted cell lineage tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree under the IS+UEM model; this guarantee includes the case where the model tree is highly unresolved, provided that tree error is defined as the number of false negative branches. We therefore conclude by outlining how quartet-based methods might be employed for tumor phylogenetics given other important challenges like copy number aberrations and doublets.

Cite as

Yunheng Han and Erin K. Molloy. Quartets Enable Statistically Consistent Estimation of Cell Lineage Trees Under an Unbiased Error and Missingness Model (Abstract). In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 8:1-8:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{han_et_al:LIPIcs.WABI.2023.8,
  author =	{Han, Yunheng and Molloy, Erin K.},
  title =	{{Quartets Enable Statistically Consistent Estimation of Cell Lineage Trees Under an Unbiased Error and Missingness Model}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{8:1--8:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.8},
  URN =		{urn:nbn:de:0030-drops-186347},
  doi =		{10.4230/LIPIcs.WABI.2023.8},
  annote =	{Keywords: Tumor Phylogenetics, Cell Lineage Trees, Quartets, Supertrees, ASTRAL}
}
  • Refine by Type
  • 3 Document/PDF
  • 1 Document/HTML

  • Refine by Publication Year
  • 1 2025
  • 2 2023

  • Refine by Author
  • 2 Han, Yunheng
  • 2 Molloy, Erin K.
  • 1 Dai, Junyan
  • 1 El-Kebir, Mohammed
  • 1 Feng, Daniel W.
  • Show More...

  • Refine by Series/Journal
  • 3 LIPIcs

  • Refine by Classification
  • 2 Applied computing → Molecular evolution
  • 1 Applied computing → Computational genomics

  • Refine by Keyword
  • 1 ASTRAL
  • 1 Camin-Sokal
  • 1 Cell Lineage Trees
  • 1 Dollo
  • 1 Intra-tumor heterogeneity
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail