DROPS

Document

DOI: 10.4230/LIPIcs.GD.2025.32

Planar Stories of Graph Drawings: Algorithms and Experiments

Authors: Carla Binucci, Sabine Cornelsen, Walter Didimo, Seok-Hee Hong, Eleni Katsanou, Maurizio Patrignani, Antonios Symvonis, and Samuel Wolf

Published in: LIPIcs, Volume 357, 33rd International Symposium on Graph Drawing and Network Visualization (GD 2025)

Abstract

We address the problem of computing a dynamic visualization of a geometric graph G as a sequence of frames. Each frame shows only a portion of the graph but their union covers G entirely. The two main requirements of our dynamic visualization are: (i) guaranteeing drawing stability, so to preserve the user’s mental map; (ii) keeping the visual complexity of each frame low. To satisfy the first requirement, we never change the position of the vertices. Regarding the second requirement, we avoid edge crossings in each frame. More precisely, in the first frame we visualize a suitable subset of non-crossing edges; in each subsequent frame, exactly one new edge enters the visualization and all the edges that cross with it are deleted. We call such a sequence of frames a planar story of G. Our goal is to find a planar story whose minimum number of edges contemporarily displayed is maximized (i.e., a planar story that maximizes the minimum frame size). Besides studying our model from a theoretical point of view, we also design and experimentally compare different algorithms, both exact techniques and heuristics. These algorithms provide an array of alternative trade-offs between efficiency and effectiveness, also depending on the structure of the input graph.

Cite as

Carla Binucci, Sabine Cornelsen, Walter Didimo, Seok-Hee Hong, Eleni Katsanou, Maurizio Patrignani, Antonios Symvonis, and Samuel Wolf. Planar Stories of Graph Drawings: Algorithms and Experiments. In 33rd International Symposium on Graph Drawing and Network Visualization (GD 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 357, pp. 32:1-32:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{binucci_et_al:LIPIcs.GD.2025.32,
  author =	{Binucci, Carla and Cornelsen, Sabine and Didimo, Walter and Hong, Seok-Hee and Katsanou, Eleni and Patrignani, Maurizio and Symvonis, Antonios and Wolf, Samuel},
  title =	{{Planar Stories of Graph Drawings: Algorithms and Experiments}},
  booktitle =	{33rd International Symposium on Graph Drawing and Network Visualization (GD 2025)},
  pages =	{32:1--32:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-403-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{357},
  editor =	{Dujmovi\'{c}, Vida and Montecchiani, Fabrizio},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.GD.2025.32},
  URN =		{urn:nbn:de:0030-drops-250182},
  doi =		{10.4230/LIPIcs.GD.2025.32},
  annote =	{Keywords: Graph Drawing, Dynamic Graphs, Graph Stories, Heuristics, ILP}
}

Document

DOI: 10.4230/LIPIcs.DISC.2025.38

Compact Routing Schemes in Undirected and Directed Graphs

Authors: Avi Kadria and Liam Roditty

Published in: LIPIcs, Volume 356, 39th International Symposium on Distributed Computing (DISC 2025)

Abstract

In this paper, we study the problem of compact routing schemes in weighted undirected and directed graphs. For weighted undirected graphs, more than a decade ago, Chechik [PODC'13] presented a ≈ 3.68k-stretch compact routing scheme that uses Õ(n^{1/k}log{D}) local storage, where D is the normalized diameter, for every k > 1. We present a ≈ 2.64k-stretch compact routing scheme that uses Õ(n^{1/k}) local storage on average in each vertex. This is the first compact routing scheme that uses total local storage of Õ(n^{1+1/k}) while achieving a c ⋅ k stretch, for a constant c < 3. In real-world network protocols, messages are usually transmitted as part of a communication session between two parties. Therefore, more than two decades ago, Thorup and Zwick [SPAA'01] considered compact routing schemes that establish a communication session using a handshake. In their handshake-based compact routing scheme, the handshake is routed along a (4k-5)-stretch path, and the rest of the communication session is routed along an optimal (2k-1)-stretch path. It is straightforward to improve the (4k-5)-stretch of the handshake to ≈ 3.68k-stretch using the compact routing scheme of Chechik [PODC'13]. We improve the handshake stretch to the optimal (2k-1), by borrowing the concept of roundtrip routing from directed graphs to undirected graphs. For weighted directed graphs, more than two decades ago, Roditty, Thorup, and Zwick [SODA'02 and TALG'08] presented a (4k+ε)-stretch compact roundtrip routing scheme that uses Õ(n^{1/k}) local storage for every k ≥ 3. For k = 3, this gives a (12+ε)-roundtrip stretch using Õ(n^{1/3}) local storage. We improve the stretch by developing a 7-roundtrip stretch routing scheme with Õ(n^{1/3}) local storage. In addition, we consider graphs with bounded hop diameter and present an optimal (2k-1)-roundtrip stretch routing scheme that uses Õ(D_{HOP}⋅ n^{1/k}), where D_{HOP} is the hop diameter of the graph.

Cite as

Avi Kadria and Liam Roditty. Compact Routing Schemes in Undirected and Directed Graphs. In 39th International Symposium on Distributed Computing (DISC 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 356, pp. 38:1-38:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kadria_et_al:LIPIcs.DISC.2025.38,
  author =	{Kadria, Avi and Roditty, Liam},
  title =	{{Compact Routing Schemes in Undirected and Directed Graphs}},
  booktitle =	{39th International Symposium on Distributed Computing (DISC 2025)},
  pages =	{38:1--38:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-402-4},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{356},
  editor =	{Kowalski, Dariusz R.},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.DISC.2025.38},
  URN =		{urn:nbn:de:0030-drops-248555},
  doi =		{10.4230/LIPIcs.DISC.2025.38},
  annote =	{Keywords: Routing schemes, Compact routing schemes, Distance oracles, Computer networks, Graph algorithms}
}

Document

DOI: 10.4230/LIPIcs.MFCS.2025.63

Shortest Paths in Multimode Graphs

Authors: Yael Kirkpatrick and Virginia Vassilevska Williams

Published in: LIPIcs, Volume 345, 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)

Abstract

In this work we study shortest path problems in multimode graphs, a generalization of the min-distance measure introduced by Abboud, Vassilevska W. and Wang in [SODA'16]. A multimode shortest path is the shortest path using one of multiple "modes" of transportation that cannot be combined. This represents real-world scenarios where different modes are not combinable, such as flights operated by different airline alliances. The problem arises naturally in machine learning in the context of learning with multiple embedding. More precisely, a k-multimode graph is a collection of k graphs on the same vertex set and the k-mode distance between two vertices is defined as the minimum among the distances computed in each individual graph. We focus on approximating fundamental graph parameters on these graphs, specifically diameter and radius. In undirected multimode graphs we first show an elegant linear time 3-approximation algorithm for 2-mode diameter. We then extend this idea into a general subroutine that can be used as a part of any α-approximation, and use it to construct a 2 and 2.5 approximation algorithm for 2-mode diameter. For undirected radius, we introduce a general scheme that can compute a 3-approximation of the k-mode radius for any k and runs in near linear time in the case of k = O(1). In the directed case we establish an equivalence between approximating 2-mode diameter on DAGs and approximating the min-diameter, while for general graphs we develop novel techniques and provide a linear time algorithm to determine whether the diameter is finite. We also develop many conditional fine-grained lower bounds for various multimode diameter and radius approximation problems. We are able to show that many of our algorithms are tight under popular fine-grained complexity hypotheses, including our linear time 3-approximation for 3-mode undirected diameter and radius. As part of this effort we propose the first extension to the Hitting Set Hypothesis [SODA'16], which we call the 𝓁-Hitting Set Hypothesis. We use this hypothesis to prove the first parameterized lower bound tradeoff for radius approximation algorithms.

Cite as

Yael Kirkpatrick and Virginia Vassilevska Williams. Shortest Paths in Multimode Graphs. In 50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 345, pp. 63:1-63:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{kirkpatrick_et_al:LIPIcs.MFCS.2025.63,
  author =	{Kirkpatrick, Yael and Vassilevska Williams, Virginia},
  title =	{{Shortest Paths in Multimode Graphs}},
  booktitle =	{50th International Symposium on Mathematical Foundations of Computer Science (MFCS 2025)},
  pages =	{63:1--63:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-388-1},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{345},
  editor =	{Gawrychowski, Pawe{\l} and Mazowiecki, Filip and Skrzypczak, Micha{\l}},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.MFCS.2025.63},
  URN =		{urn:nbn:de:0030-drops-241703},
  doi =		{10.4230/LIPIcs.MFCS.2025.63},
  annote =	{Keywords: Graph Algorithms, Shortest Paths, Diameter, Radius, Fine-Grained Complexity}
}

Document

DOI: 10.4230/LIPIcs.WABI.2025.9

Dolphyin: A Combinatorial Algorithm for Identifying 1-Dollo Phylogenies in Cancer

Authors: Daniel W. Feng and Mohammed El-Kebir

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)

Abstract

Several recent cancer phylogeny inference methods have used the k-Dollo evolutionary model for single-nucleotide variants. Specifically, in this problem one is given an m × n binary matrix B and seeks a rooted tree T with m leaves that correspond to the m rows of B, and each node of T is labeled by a binary state for each of the n characters subject to the restriction that each character is gained at most once (0-to-1 transition) and subsequently lost at most k times (1-to-0 transitions). The 1-Dollo variant, also known as the persistent perfect phylogeny where one is restricted to at most k = 1 losses per character, has been studied extensively, but its hardness remains an open question. Here, we prove that the 1-Dollo Linear Phylogeny (1DLP) problem, where we additionally require the resulting 1-Dollo phylogeny T to be linear, is equivalent to verifying whether the input matrix B adheres to the Consecutive Ones Property (C1P), which can be solved in polynomial time. Due to the equivalence, several known NP-hardness results for relevant variants of C1P carry over to 1DLP, including the minimization of false negatives (0-to-1 modifications to the input matrix B) or the allowance of 2 gains and 2 losses. We furthermore show how we can recursively decompose any, not necessarily linear, 1-Dollo phylogeny T into several 1-Dollo linear phylogenies, connected by matching branching points. We extend this characterization to matrices B that admit 1-Dollo phylogenies, giving necessary and sufficient conditions for the existence of a novel decomposition of B into several submatrices and corresponding branching points. This decomposition forms the basis of Dolphyin, a new exponential-time algorithm for inferring 1-Dollo phylogenies that efficiently leverages the determination of linear 1-Dollo phylogenies as a subroutine. Dolphyin can also be applied to input matrices B with false negatives. We demonstrate that Dolphyin is runtime-competitive with a previous integer linear programming based algorithm SPhyR on simulated datasets. We additionally analyze simulated datasets with false negative errors and find that in the median case, Dolphyin infers 1-Dollo phylogenies with inferred error rates at or below the ground truth rate. Finally, we apply Dolphyin to 99 acute myeloid leukemia single-cell sequencing datasets, finding that the majority of the cancers can be explained by 1-Dollo phylogenies with false negative error rates in line with the used sequencing technology. Availability. Dolphyin is available at: https://github.com/elkebir-group/Dolphyin.

Cite as

Daniel W. Feng and Mohammed El-Kebir. Dolphyin: A Combinatorial Algorithm for Identifying 1-Dollo Phylogenies in Cancer. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 9:1-9:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{feng_et_al:LIPIcs.WABI.2025.9,
  author =	{Feng, Daniel W. and El-Kebir, Mohammed},
  title =	{{Dolphyin: A Combinatorial Algorithm for Identifying 1-Dollo Phylogenies in Cancer}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{9:1--9:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.9},
  URN =		{urn:nbn:de:0030-drops-239356},
  doi =		{10.4230/LIPIcs.WABI.2025.9},
  annote =	{Keywords: Intra-tumor heterogeneity, persistent perfect phylogeny, consecutive ones property, combinatorics}
}

Document

Research

DOI: 10.4230/OASIcs.Grossi.20

Subsequence-Based Indices for Genome Sequence Analysis

Authors: Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini

Published in: OASIcs, Volume 132, From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday (2025)

Abstract

Compact indices are a fundamental tool in string analysis, even more so in bioinformatics, where genomic sequences can reach billions in length. This paper presents some recent results in which Roberto Grossi has been involved, showing how some of these indices do more than just efficiently represent data, but rather are able to bring out salient information within it, which can be exploited for their downstream analysis. Specifically, we first review a recently-introduced method [Guerrini et al., 2023] that employs the Burrows-Wheeler Transform to build reasonably accurate phylogenetic trees in an assembly-free scenario. We then describe a recent practical tool [Buzzega et al., 2025] for indexing Maximal Common Subsequences between strings, which can enable analysis of genomic sequence similarity. Experimentally, we show that the results produced by the one index are consistent with the expectations about the results of the other index.

Cite as

Giovanni Buzzega, Alessio Conte, Veronica Guerrini, Giulia Punzi, Giovanna Rosone, and Lorenzo Tattini. Subsequence-Based Indices for Genome Sequence Analysis. In From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday. Open Access Series in Informatics (OASIcs), Volume 132, pp. 20:1-20:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{buzzega_et_al:OASIcs.Grossi.20,
  author =	{Buzzega, Giovanni and Conte, Alessio and Guerrini, Veronica and Punzi, Giulia and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{Subsequence-Based Indices for Genome Sequence Analysis}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{20:1--20:21},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.20},
  URN =		{urn:nbn:de:0030-drops-238199},
  doi =		{10.4230/OASIcs.Grossi.20},
  annote =	{Keywords: String Indices, Burrows-Wheeler Transform, Maximal Common Subsequences, Sequence Analysis, Phylogeny}
}

@InProceedings{buzzega_et_al:OASIcs.Grossi.20,
  author =	{Buzzega, Giovanni and Conte, Alessio and Guerrini, Veronica and Punzi, Giulia and Rosone, Giovanna and Tattini, Lorenzo},
  title =	{{Subsequence-Based Indices for Genome Sequence Analysis}},
  booktitle =	{From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi's 60th Birthday},
  pages =	{20:1--20:21},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-391-1},
  ISSN =	{2190-6807},
  year =	{2025},
  volume =	{132},
  editor =	{Conte, Alessio and Marino, Andrea and Rosone, Giovanna and Vitter, Jeffrey Scott},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.Grossi.20},
  URN =		{urn:nbn:de:0030-drops-238199},
  doi =		{10.4230/OASIcs.Grossi.20},
  annote =	{Keywords: String Indices, Burrows-Wheeler Transform, Maximal Common Subsequences, Sequence Analysis, Phylogeny}
}

Document

Track A: Algorithms, Complexity and Games

DOI: 10.4230/LIPIcs.ICALP.2025.42

Fitting Tree Metrics and Ultrametrics in Data Streams

Authors: Amir Carmel, Debarati Das, Evangelos Kipouridis, and Evangelos Pipis

Published in: LIPIcs, Volume 334, 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)

Abstract

Fitting distances to tree metrics and ultrametrics are two widely used methods in hierarchical clustering, primarily explored within the context of numerical taxonomy. Formally, given a positive distance function D: binom(V,2) → ℝ_{>0}, the goal is to find a tree (or an ultrametric) T including all elements of set V, such that the difference between the distances among vertices in T and those specified by D is minimized. Numerical taxonomy was first introduced by Sneath and Sokal [Nature 1962], and since then it has been studied extensively in both biology and computer science. In this paper, we initiate the study of ultrametric and tree metric fitting problems in the semi-streaming model, where the distances between pairs of elements from V (with |V| = n), defined by the function D, can arrive in an arbitrary order. We study these problems under various distance norms; namely the 𝓁₀ objective, which aims to minimize the number of modified entries in D to fit a tree-metric or an ultrametric; the 𝓁₁ objective, which seeks to minimize the total sum of distance errors across all pairs of points in V; and the 𝓁_∞ objective, which focuses on minimizing the maximum error incurred by any entries in D. - Our first result addresses the 𝓁₀ objective. We provide a single-pass polynomial-time Õ(n)-space O(1) approximation algorithm for ultrametrics and prove that no single-pass exact algorithm exists, even with exponential time. - Next, we show that the algorithm for 𝓁₀ implies an O(Δ/δ) approximation for the 𝓁₁ objective, where Δ is the maximum, and δ is the minimum absolute difference between distances in the input. This bound matches the best-known approximation for the RAM model using a combinatorial algorithm when Δ/δ = O(n). - For the 𝓁_∞ objective, we provide a complete characterization of the ultrametric fitting problem. First, we present a single-pass polynomial-time Õ(n)-space 2-approximation algorithm and show that no better than 2-approximation is possible, even with exponential time. Furthermore, we show that with an additional pass, it is possible to achieve a polynomial-time exact algorithm for ultrametrics. - Finally, we extend all these results to tree metrics by using only one additional pass through the stream and without asymptotically increasing the approximation factor.

Cite as

Amir Carmel, Debarati Das, Evangelos Kipouridis, and Evangelos Pipis. Fitting Tree Metrics and Ultrametrics in Data Streams. In 52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 334, pp. 42:1-42:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)

Copy BibTex To Clipboard

@InProceedings{carmel_et_al:LIPIcs.ICALP.2025.42,
  author =	{Carmel, Amir and Das, Debarati and Kipouridis, Evangelos and Pipis, Evangelos},
  title =	{{Fitting Tree Metrics and Ultrametrics in Data Streams}},
  booktitle =	{52nd International Colloquium on Automata, Languages, and Programming (ICALP 2025)},
  pages =	{42:1--42:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-372-0},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{334},
  editor =	{Censor-Hillel, Keren and Grandoni, Fabrizio and Ouaknine, Jo\"{e}l and Puppis, Gabriele},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ICALP.2025.42},
  URN =		{urn:nbn:de:0030-drops-234197},
  doi =		{10.4230/LIPIcs.ICALP.2025.42},
  annote =	{Keywords: Streaming, Clustering, Ultrametrics, Tree metrics, Distance fitting}
}

Document

Abstract

DOI: 10.4230/LIPIcs.WABI.2023.2

EMMA: Adding Sequences into a Constraint Alignment with High Accuracy and Scalability (Abstract)

Authors: Chengze Shen, Baqiao Liu, Kelly P. Williams, and Tandy Warnow

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

Multiple sequence alignment (MSA) is a crucial precursor to many downstream biological analyses, such as phylogeny estimation [Morrison, 2006], RNA structure prediction [Shapiro et al., 2007], protein structure prediction [Jumper et al., 2021], etc. Obtaining an accurate MSA can be challenging, especially when the dataset is large (i.e., more than 1000 sequences). A key technique for large-scale MSA estimation is to add sequences into an existing alignment. For example, biological knowledge can be used to form a reference alignment on a subset of the sequences, and then the remaining sequences can be added to the reference alignment. Another case where adding sequences into an existing alignment occurs is when new sequences or genomes are added to databases, leading to the opportunity to add the new sequences for each gene in the genome into a growing alignment. A third case is for de novo multiple sequence alignment, where a subset of the sequences is selected and aligned, and then the remaining sequences are added into this "backbone alignment" [Nguyen et al., 2015; Park et al., 2023; Shen et al., 2022; Liu and Warnow, 2023; Park and Warnow, 2023; Yamada et al., 2016]. Thus, adding sequences into existing alignments is a natural problem with multiple applications to biological sequence analysis. A few methods have been developed to add sequences into an existing alignment, with MAFFT--add [Katoh and Frith, 2012] perhaps the most well-known. However, several multiple sequence alignment methods that operate in two steps (first extract and align the backbone sequences and then add the remaining sequences into this backbone alignment) also provide utilities for adding sequences into a user-provided alignment. We present EMMA, a new approach for adding "query" sequences into an existing "constraint" alignment. By construction, EMMA never changes the constraint alignment, except through the introduction of additional sites to represent homologies between the query sequences. EMMA uses a divide-and-conquer technique combined with MAFFT--add (using the most accurate setting, MAFFT-linsi--add) to add sequences into a user-provided alignment. We evaluate EMMA by comparing it to MAFFT-linsi--add, MAFFT--add (the default setting), and WITCH-ng-add. We include a range of biological and simulated datasets (nucleotides and proteins) ranging in size from 1000 to almost 200,000 sequences and evaluate alignment accuracy and scalability. MAFFT-linsi--add was the slowest and least scalable method, only able to run on datasets with at most 1000 sequences in this study, but had excellent accuracy (often the best) on those datasets. We also see that EMMA has better recall than WITCH-ng-add and MAFFT--add on large datasets, especially when the backbone alignment is small or clade-based.

Cite as

Chengze Shen, Baqiao Liu, Kelly P. Williams, and Tandy Warnow. EMMA: Adding Sequences into a Constraint Alignment with High Accuracy and Scalability (Abstract). In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 2:1-2:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{shen_et_al:LIPIcs.WABI.2023.2,
  author =	{Shen, Chengze and Liu, Baqiao and Williams, Kelly P. and Warnow, Tandy},
  title =	{{EMMA: Adding Sequences into a Constraint Alignment with High Accuracy and Scalability}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{2:1--2:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.2},
  URN =		{urn:nbn:de:0030-drops-186285},
  doi =		{10.4230/LIPIcs.WABI.2023.2},
  annote =	{Keywords: Multiple sequence alignment, constraint alignment, MAFFT}
}

Document

Abstract

DOI: 10.4230/LIPIcs.WABI.2023.3

BATCH-SCAMPP: Scaling Phylogenetic Placement Methods to Place Many Sequences (Abstract)

Authors: Eleanor Wedell, Chengze Shen, and Tandy Warnow

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

Phylogenetic placement is the problem of placing one or more query sequences into a phylogenetic "backbone" tree, which may be a maximum likelihood tree on a multiple sequence alignment for a single gene, a taxonomy with leaves labeled by sequences for a single gene [Nidhi Shah et al., 2021], or a species tree [Jiang et al., 2023]. When the backbone tree is a tree estimated on a single gene, the most accurate techniques for phylogenetic placement are likelihood-based, and can be computationally intensive when the backbone trees are large [Chu and Warnow, 2023]. Phylogenetic placement into gene trees occurs when updating existing gene trees with newly observed sequences, but can also be applied in the "bulk" context, where many sequences are placed at the same time into the backbone tree. For example, phylogenetic placement can be used to taxonomically characterize shotgun sequencing reads generated for an environmental sample in metagenomic analysis [Nidhi Shah et al., 2021; Barbera et al., 2019]. The two most well known maximum likelihood phylogenetic placement methods are pplacer [Chu and Warnow, 2023] and EPA-ng [Barbera et al., 2019]. Of these two, EPA-ng is optimized for scaling the number of query sequences and is capable of placing millions of sequences into phylogenetic trees of up to a few thousand sequences [Barbera et al., 2019], and achieves sublinear runtime in the number of query sequences (see Figure 2 from [Balaban et al., 2022]). Previously we introduced the SCAMPP framework [Wedell et al., 2022] to enable both pplacer and EPA-ng to perform phylogenetic placement into ultra-large backbone trees, and we demonstrated its utility for placing into backbone trees with up to 200,000 sequences. By using maximum likelihood methods pplacer or EPA-ng within the SCAMPP framework, the resulting placements are more accurate than with APPLES-2 [Balaban et al., 2022], with the most notable accuracy improvement for fragmentary sequences, and are computationally similar for single query sequence placement [Wedell et al., 2022]. However, SCAMPP was designed to incrementally update a large tree, one query sequence at a time, and was not optimized for the other uses of phylogenetic placement, where batch placement of many sequencing reads is required. Here we introduce BATCH-SCAMPP, a technique that improves scalability in both dimensions: the number of query sequences being placed into the backbone tree and the size of the backbone tree. Furthermore, BATCH-SCAMPP is specifically designed to improve EPA-ng’s scalability to large backbone trees. Although BATCH-SCAMPP is based on SCAMPP, it uses a substantially modified design in order to be able to take advantage of EPA-ng’s ability to place many query sequences efficiently. The BATCH-SCAMPP method operates by allowing the input set of query sequences to suggest and then vote on placement subtrees, thus enabling many query sequences to select the same placement subtree. We pair BATCH-SCAMPP with EPA-ng to explore the capability of this approach for scaling to many query sequences. We show that this combination of techniques (which we call BSCAMPP+EPA-ng, or BSCAMPP(e)) not only provides high accuracy and scalability to large backbone trees, matching that of SCAMPP used with EPA-ng (i.e., SCAMPP(e)), but also achieves the goal of scaling sublinearly in the number of query sequences. Moreover, it is much more scalable than EPA-ng and faster than SCAMPP+EPA-ng: when placing 10,000 sequences into a backbone tree of 50,000 leaves, EPA-ng is unable to run due to memory issues, SCAMPP+EPA-ng requires 1421 minutes, and BSCAMPP(e) places all sequences in 7 minutes (all given the same computational resources. Figure 1 gives an example of this performance advantage on the nt78 [Chu and Warnow, 2023] simulated dataset.

Cite as

Eleanor Wedell, Chengze Shen, and Tandy Warnow. BATCH-SCAMPP: Scaling Phylogenetic Placement Methods to Place Many Sequences (Abstract). In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 3:1-3:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{wedell_et_al:LIPIcs.WABI.2023.3,
  author =	{Wedell, Eleanor and Shen, Chengze and Warnow, Tandy},
  title =	{{BATCH-SCAMPP: Scaling Phylogenetic Placement Methods to Place Many Sequences}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{3:1--3:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.3},
  URN =		{urn:nbn:de:0030-drops-186296},
  doi =		{10.4230/LIPIcs.WABI.2023.3},
  annote =	{Keywords: Phylogenetic Placement, EPA-ng, Phylogenetics}
}

Document

DOI: 10.4230/LIPIcs.WABI.2022.8

Fast and Accurate Species Trees from Weighted Internode Distances

Authors: Baqiao Liu and Tandy Warnow

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)

Abstract

Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., "gene tree heterogeneity"). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing "gene trees") and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. Our experimental study evaluating weighted ASTRID shows improvements in accuracy compared to the original (unweighted) ASTRID while remaining fast. Moreover, weighted ASTRID shows competitive accuracy against weighted ASTRAL, the state of the art. Thus, this study provides a new and very fast method for species tree estimation that improves upon ASTRID and has comparable accuracy with the state of the art while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode.

Cite as

Baqiao Liu and Tandy Warnow. Fast and Accurate Species Trees from Weighted Internode Distances. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 8:1-8:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{liu_et_al:LIPIcs.WABI.2022.8,
  author =	{Liu, Baqiao and Warnow, Tandy},
  title =	{{Fast and Accurate Species Trees from Weighted Internode Distances}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{8:1--8:24},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.8},
  URN =		{urn:nbn:de:0030-drops-170424},
  doi =		{10.4230/LIPIcs.WABI.2022.8},
  annote =	{Keywords: Species tree estimation, ASTRID, ASTRAL, multi-species coalescent, incomplete lineage sorting}
}

Document

DOI: 10.4230/LIPIcs.WABI.2020.15

Advancing Divide-And-Conquer Phylogeny Estimation Using Robinson-Foulds Supertrees

Authors: Xilin Yu, Thien Le, Sarah Christensen, Erin K. Molloy, and Tandy Warnow

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)

Abstract

One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a "supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS.

Cite as

Xilin Yu, Thien Le, Sarah Christensen, Erin K. Molloy, and Tandy Warnow. Advancing Divide-And-Conquer Phylogeny Estimation Using Robinson-Foulds Supertrees. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 15:1-15:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{yu_et_al:LIPIcs.WABI.2020.15,
  author =	{Yu, Xilin and Le, Thien and Christensen, Sarah and Molloy, Erin K. and Warnow, Tandy},
  title =	{{Advancing Divide-And-Conquer Phylogeny Estimation Using Robinson-Foulds Supertrees}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{15:1--15:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.15},
  URN =		{urn:nbn:de:0030-drops-128048},
  doi =		{10.4230/LIPIcs.WABI.2020.15},
  annote =	{Keywords: supertrees, divide-and-conquer, phylogeny estimation}
}

Document

DOI: 10.4230/LIPIcs.WABI.2019.4

TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees

Authors: Sarah Christensen, Erin K. Molloy, Pranjal Vachaspati, and Tandy Warnow

Published in: LIPIcs, Volume 143, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)

Abstract

Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1.

Cite as

Sarah Christensen, Erin K. Molloy, Pranjal Vachaspati, and Tandy Warnow. TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 4:1-4:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{christensen_et_al:LIPIcs.WABI.2019.4,
  author =	{Christensen, Sarah and Molloy, Erin K. and Vachaspati, Pranjal and Warnow, Tandy},
  title =	{{TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees}},
  booktitle =	{19th International Workshop on Algorithms in Bioinformatics (WABI 2019)},
  pages =	{4:1--4:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-123-8},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{143},
  editor =	{Huber, Katharina T. and Gusfield, Dan},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2019.4},
  URN =		{urn:nbn:de:0030-drops-110347},
  doi =		{10.4230/LIPIcs.WABI.2019.4},
  annote =	{Keywords: Gene tree correction, horizontal gene transfer, incomplete lineage sorting}
}

Document

DOI: 10.4230/LIPIcs.WABI.2018.8

New Absolute Fast Converging Phylogeny Estimation Methods with Improved Scalability and Accuracy

Authors: Qiuyi (Richard) Zhang, Satish Rao, and Tandy Warnow

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)

Abstract

Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch lengths are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was DCM_NJ, published in SODA 2001. The main empirical advantage of DCM_NJ over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, DCM_NJ is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of DCM_NJ but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other AFC methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).

Cite as

Qiuyi (Richard) Zhang, Satish Rao, and Tandy Warnow. New Absolute Fast Converging Phylogeny Estimation Methods with Improved Scalability and Accuracy. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 8:1-8:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{zhang_et_al:LIPIcs.WABI.2018.8,
  author =	{Zhang, Qiuyi (Richard) and Rao, Satish and Warnow, Tandy},
  title =	{{New Absolute Fast Converging Phylogeny Estimation Methods with Improved Scalability and Accuracy}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{8:1--8:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.8},
  URN =		{urn:nbn:de:0030-drops-93108},
  doi =		{10.4230/LIPIcs.WABI.2018.8},
  annote =	{Keywords: phylogeny estimation, short quartets, sample complexity, absolute fast converging methods, neighbor joining, maximum likelihood}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.2

Gene Tree Parsimony for Incomplete Gene Trees

Authors: Md. Shamsuzzoha Bayzid and Tandy Warnow

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/shamsbayzid/DynaDup.

Cite as

Md. Shamsuzzoha Bayzid and Tandy Warnow. Gene Tree Parsimony for Incomplete Gene Trees. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 2:1-2:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{bayzid_et_al:LIPIcs.WABI.2017.2,
  author =	{Bayzid, Md. Shamsuzzoha and Warnow, Tandy},
  title =	{{Gene Tree Parsimony for Incomplete Gene Trees}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{2:1--2:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.2},
  URN =		{urn:nbn:de:0030-drops-76495},
  doi =		{10.4230/LIPIcs.WABI.2017.2},
  annote =	{Keywords: Gene duplication and loss, gene tree parsimony, deep coalescence}
}

Document

DOI: 10.4230/LIPIcs.WABI.2017.27

Optimal Completion of Incomplete Gene Trees in Polynomial Time Using OCTAL

Authors: Sarah Christensen, Erin K. Molloy, Pranjal Vachaspati, and Tandy Warnow

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)

Abstract

Here we introduce the Optimal Tree Completion Problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. More formally, given a pair of unrooted binary trees (T,t) where T has leaf set S and t has leaf set R, a subset of S, we wish to add all the leaves from S \ R to t so as to produce a new tree t' on leaf set S that has the minimum distance to T. We show that when the distance is defined by the Robinson-Foulds (RF) distance, an optimal solution can be found in polynomial time. We also present OCTAL, an algorithm that solves this RF Optimal Tree Completion Problem exactly in quadratic time. We report on a simulation study where we complete estimated gene trees using a reference tree that is based on a species tree estimated from a multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach, but the accuracy of the completed gene trees computed by OCTAL depends on how topologically similar the estimated species tree is to the true gene tree. Hence, under conditions with relatively low gene tree heterogeneity, OCTAL can be used to provide highly accurate completions of estimated gene trees. We close with a discussion of future research.

Cite as

Sarah Christensen, Erin K. Molloy, Pranjal Vachaspati, and Tandy Warnow. Optimal Completion of Incomplete Gene Trees in Polynomial Time Using OCTAL. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 27:1-27:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{christensen_et_al:LIPIcs.WABI.2017.27,
  author =	{Christensen, Sarah and Molloy, Erin K. and Vachaspati, Pranjal and Warnow, Tandy},
  title =	{{Optimal Completion of Incomplete Gene Trees in Polynomial Time Using OCTAL}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{27:1--27:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.27},
  URN =		{urn:nbn:de:0030-drops-76392},
  doi =		{10.4230/LIPIcs.WABI.2017.27},
  annote =	{Keywords: phylogenomics, missing data, coalescent-based species tree estimation, gene trees}
}

Document

DOI: 10.4230/DagRep.6.8.91

Next Generation Sequencing (Dagstuhl Seminar 16351)

Authors: Gene Myers, Mihai Pop, Knut Reinert, and Tandy Warnow

Published in: Dagstuhl Reports, Volume 6, Issue 8 (2017)

Abstract

Next Generation Sequencing (NGS) data have begun to appear in many applications that are clinically relevant, such as resequencing of cancer patients, disease-gene discovery and diagnostics for rare diseases, microbiome analyses, and gene expression profiling. The analysis of sequencing data is demanding because of the enormous data volume and the need for fast turnaround time, accuracy, reproducibility, and data security. This Dagstuhl Seminar aimed at a free and deep exchange of ideas and needs between the communities of algorithmicists and theoreticians and practitioners from the biomedical field. It identified several relevant fields such as data structures and algorithms for large data sets, hardware acceleration, new problems in the upcoming age of genomes, etc. which were discussed in breakout groups.

Cite as

Gene Myers, Mihai Pop, Knut Reinert, and Tandy Warnow. Next Generation Sequencing (Dagstuhl Seminar 16351). In Dagstuhl Reports, Volume 6, Issue 8, pp. 91-130, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@Article{myers_et_al:DagRep.6.8.91,
  author =	{Myers, Gene and Pop, Mihai and Reinert, Knut and Warnow, Tandy},
  title =	{{Next Generation Sequencing (Dagstuhl Seminar 16351)}},
  pages =	{91--130},
  journal =	{Dagstuhl Reports},
  ISSN =	{2192-5283},
  year =	{2017},
  volume =	{6},
  number =	{8},
  editor =	{Myers, Gene and Pop, Mihai and Reinert, Knut and Warnow, Tandy},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/DagRep.6.8.91},
  URN =		{urn:nbn:de:0030-drops-68395},
  doi =		{10.4230/DagRep.6.8.91},
  annote =	{Keywords: Cancer, DNA Sequence Assembly, Expression Profiles, Next Generation Sequencing, Sequence analysis, Software Engineering (Tools \& Libraries)}
}

15 Search Results for "Warnow, Tandy"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message