DROPS

Volume

LIPIcs, Volume 201

21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

WABI 2021, August 2-4, 2021, Virtual Conference

Editors: Alessandra Carbone and Mohammed El-Kebir

Document

DOI: 10.4230/LIPIcs.WABI.2023.9

Inferring Temporally Consistent Migration Histories

Authors: Mrinmoy Saha Roddur, Sagi Snir, and Mohammed El-Kebir

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

Not only do many biological populations undergo evolution, but population members may also migrate from one location to another. For example, tumor cells may migrate from the primary tumor and seed a new metastasis, and pathogens may migrate from one host to another. One may represent a population’s migration history by labeling the vertices of a given phylogeny T with locations such that an edge incident to vertices with distinct locations represents a migration. Additionally, in some biological populations, taxa from distinct lineages may comigrate from one location to another in a single event, a phenomenon known as a comigration. Here, we show that a previous problem statement for inferring migration histories that are parsimonious in terms of migrations and comigrations may lead to temporally inconsistent solutions. To remedy this deficiency, we introduce precise definitions of temporal consistency of comigrations in a phylogeny, leading to three successive problems. First, we formulate the Temporally Consistent Comigrations (TCC) problem to check if a set of comigrations is temporally consistent and provide a linear time algorithm for solving this problem. Second, we formulate the Parsimonious Consistent Comigration (PCC) problem, which aims to find comigrations given a location labeling of a phylogeny. We show that PCC is NP-hard. Third, we formulate the Parsimonious Consistent Comigration History (PCCH) problem, which infers the migration history given a phylogeny and locations of its extant vertices only. We show that PCCH is NP-hard as well. On the positive side, we propose integer linear programming models to solve the PCC and PCCH problems. We apply our approach to real and simulated data.

Cite as

Mrinmoy Saha Roddur, Sagi Snir, and Mohammed El-Kebir. Inferring Temporally Consistent Migration Histories. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 9:1-9:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{roddur_et_al:LIPIcs.WABI.2023.9,
  author =	{Roddur, Mrinmoy Saha and Snir, Sagi and El-Kebir, Mohammed},
  title =	{{Inferring Temporally Consistent Migration Histories}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{9:1--9:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.9},
  URN =		{urn:nbn:de:0030-drops-186357},
  doi =		{10.4230/LIPIcs.WABI.2023.9},
  annote =	{Keywords: Metastasis, Migration, Integer Linear Programming, Maximum parsimony}
}

Document

DOI: 10.4230/LIPIcs.WABI.2023.21

Balancing Minimum Free Energy and Codon Adaptation Index for Pareto Optimal RNA Design

Authors: Xinyu Gu, Yuanyuan Qi, and Mohammed El-Kebir

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)

Abstract

The problem of designing an RNA sequence v that encodes for a given target protein w plays an important role in messenger RNA (mRNA) vaccine design. Due to codon degeneracy, there exist exponentially many RNA sequences for a single target protein. These candidate RNA sequences may adopt different secondary structure conformations with varying minimum free energy (MFE), affecting their thermodynamic stability and consequently mRNA half-life. In addition, species-specific codon usage bias, as measured by the codon adaptation index (CAI), also plays an essential role in translation efficiency. While previous works have focused on optimizing either MFE or CAI, more recent works have shown the merits of optimizing both objectives. Importantly, there is a trade-off between MFE and CAI, i.e. optimizing one objective is at the expense of the other. Here, we formulate the Pareto Optimal RNA Design problem, seeking the set of Pareto optimal solutions for which no other solution exists that is better in terms of both MFE and CAI. We introduce DERNA (DEsign RNA), which uses the weighted sum method to enumerate the Pareto front by optimizing convex combinations of both objectives. DERNA uses dynamic programming to solve each convex combination in O(|w|³) time and O(|w|²) space. Compared to a previous approach that only optimizes MFE, we show on a benchmark dataset that DERNA obtains solutions with identical MFE but superior CAI. Additionally, we show that DERNA matches the performance in terms of solution quality of LinearDesign, a recent approach that similarly seeks to balance MFE and CAI. Finally, we demonstrate our method’s potential for mRNA vaccine design using SARS-CoV-2 spike as the target protein.

Cite as

Xinyu Gu, Yuanyuan Qi, and Mohammed El-Kebir. Balancing Minimum Free Energy and Codon Adaptation Index for Pareto Optimal RNA Design. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 21:1-21:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{gu_et_al:LIPIcs.WABI.2023.21,
  author =	{Gu, Xinyu and Qi, Yuanyuan and El-Kebir, Mohammed},
  title =	{{Balancing Minimum Free Energy and Codon Adaptation Index for Pareto Optimal RNA Design}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{21:1--21:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.21},
  URN =		{urn:nbn:de:0030-drops-186479},
  doi =		{10.4230/LIPIcs.WABI.2023.21},
  annote =	{Keywords: Multi-objective optimization, dynamic programming, RNA sequence design, reverse translation, mRNA vaccine design}
}

Document

DOI: 10.4230/OASIcs.EVCS.2023.5

Spoofax at Oracle: Domain-Specific Language Engineering for Large-Scale Graph Analytics

Authors: Houda Boukham, Guido Wachsmuth, Toine Hartman, Hamza Boucherit, Oskar van Rest, Hassan Chafi, Sungpack Hong, Martijn Dwars, Arnaud Delamare, and Dalila Chiadmi

Published in: OASIcs, Volume 109, Eelco Visser Commemorative Symposium (EVCS 2023)

Abstract

For the last decade, teams at Oracle relied on the Spoofax language workbench to develop a family of domain-specific languages for graph analytics in research projects and in product development. In this paper, we analyze the requirements for integrating language processors into large-scale graph analytics toolkits and for the development of these language processors as part of a larger product development process. We discuss how Spoofax helps to meet these requirements and point out the need for future improvements.

Cite as

Houda Boukham, Guido Wachsmuth, Toine Hartman, Hamza Boucherit, Oskar van Rest, Hassan Chafi, Sungpack Hong, Martijn Dwars, Arnaud Delamare, and Dalila Chiadmi. Spoofax at Oracle: Domain-Specific Language Engineering for Large-Scale Graph Analytics. In Eelco Visser Commemorative Symposium (EVCS 2023). Open Access Series in Informatics (OASIcs), Volume 109, pp. 5:1-5:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{boukham_et_al:OASIcs.EVCS.2023.5,
  author =	{Boukham, Houda and Wachsmuth, Guido and Hartman, Toine and Boucherit, Hamza and van Rest, Oskar and Chafi, Hassan and Hong, Sungpack and Dwars, Martijn and Delamare, Arnaud and Chiadmi, Dalila},
  title =	{{Spoofax at Oracle: Domain-Specific Language Engineering for Large-Scale Graph Analytics}},
  booktitle =	{Eelco Visser Commemorative Symposium (EVCS 2023)},
  pages =	{5:1--5:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-267-9},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{109},
  editor =	{L\"{a}mmel, Ralf and Mosses, Peter D. and Steimann, Friedrich},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.EVCS.2023.5},
  URN =		{urn:nbn:de:0030-drops-177756},
  doi =		{10.4230/OASIcs.EVCS.2023.5},
  annote =	{Keywords: language workbench, domain-specific language}
}

@InProceedings{boukham_et_al:OASIcs.EVCS.2023.5,
  author =	{Boukham, Houda and Wachsmuth, Guido and Hartman, Toine and Boucherit, Hamza and van Rest, Oskar and Chafi, Hassan and Hong, Sungpack and Dwars, Martijn and Delamare, Arnaud and Chiadmi, Dalila},
  title =	{{Spoofax at Oracle: Domain-Specific Language Engineering for Large-Scale Graph Analytics}},
  booktitle =	{Eelco Visser Commemorative Symposium (EVCS 2023)},
  pages =	{5:1--5:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-267-9},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{109},
  editor =	{L\"{a}mmel, Ralf and Mosses, Peter D. and Steimann, Friedrich},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.EVCS.2023.5},
  URN =		{urn:nbn:de:0030-drops-177756},
  doi =		{10.4230/OASIcs.EVCS.2023.5},
  annote =	{Keywords: language workbench, domain-specific language}
}

Document

Complete Volume

DOI: 10.4230/LIPIcs.WABI.2021

LIPIcs, Volume 201, WABI 2021, Complete Volume

Authors: Alessandra Carbone and Mohammed El-Kebir

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

LIPIcs, Volume 201, WABI 2021, Complete Volume

Cite as

21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 1-400, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@Proceedings{carbone_et_al:LIPIcs.WABI.2021,
  title =	{{LIPIcs, Volume 201, WABI 2021, Complete Volume}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{1--400},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021},
  URN =		{urn:nbn:de:0030-drops-143520},
  doi =		{10.4230/LIPIcs.WABI.2021},
  annote =	{Keywords: LIPIcs, Volume 201, WABI 2021, Complete Volume}
}

Document

Front Matter

DOI: 10.4230/LIPIcs.WABI.2021.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Alessandra Carbone and Mohammed El-Kebir

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 0:i-0:x, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{carbone_et_al:LIPIcs.WABI.2021.0,
  author =	{Carbone, Alessandra and El-Kebir, Mohammed},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{0:i--0:x},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.0},
  URN =		{urn:nbn:de:0030-drops-143531},
  doi =		{10.4230/LIPIcs.WABI.2021.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.1

The Most Parsimonious Reconciliation Problem in the Presence of Incomplete Lineage Sorting and Hybridization Is NP-Hard

Authors: Matthew LeMay, Yi-Chieh Wu, and Ran Libeskind-Hadas

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

The maximum parsimony phylogenetic reconciliation problem seeks to explain incongruity between a gene phylogeny and a species phylogeny with respect to a set of evolutionary events. While the reconciliation problem is well-studied for species and gene trees subject to events such as duplication, transfer, loss, and deep coalescence, recent work has examined species phylogenies that incorporate hybridization and are thus represented by networks rather than trees. In this paper, we show that the problem of computing a maximum parsimony reconciliation for a gene tree and species network is NP-hard even when only considering deep coalescence. This result suggests that future work on maximum parsimony reconciliation for species networks should explore approximation algorithms and heuristics.

Cite as

Matthew LeMay, Yi-Chieh Wu, and Ran Libeskind-Hadas. The Most Parsimonious Reconciliation Problem in the Presence of Incomplete Lineage Sorting and Hybridization Is NP-Hard. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 1:1-1:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{lemay_et_al:LIPIcs.WABI.2021.1,
  author =	{LeMay, Matthew and Wu, Yi-Chieh and Libeskind-Hadas, Ran},
  title =	{{The Most Parsimonious Reconciliation Problem in the Presence of Incomplete Lineage Sorting and Hybridization Is NP-Hard}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{1:1--1:10},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.1},
  URN =		{urn:nbn:de:0030-drops-143546},
  doi =		{10.4230/LIPIcs.WABI.2021.1},
  annote =	{Keywords: phylogenetics, reconciliation, deep coalescence, hybridization, NP-hardness}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.2

Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence

Authors: Yoshiki Nakagawa, Satsuya Ohata, and Kana Shimizu

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

Finding a similar substring that commonly appears in query and database sequences is an essential task for genome data analysis. This study proposes a secure two-party variable-length string search protocol based on secret sharing. The unique feature of our protocol is that time, communication, and round complexities are not dependent on the database length N, after the query input. This property brings dramatic performance improvements in search time, since N is usually quite large in an actual genome database, and the same database is repeatedly used for many queries. Our concept hinges on a technique that efficiently applies the compressed full-text index (FOCS 2000) for a secret-sharing scheme. We conducted an experiment using a human genomic sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a well-designed baseline protocol under the realistic computation/network environment.

Cite as

Yoshiki Nakagawa, Satsuya Ohata, and Kana Shimizu. Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 2:1-2:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{nakagawa_et_al:LIPIcs.WABI.2021.2,
  author =	{Nakagawa, Yoshiki and Ohata, Satsuya and Shimizu, Kana},
  title =	{{Efficient Privacy-Preserving Variable-Length Substring Match for Genome Sequence}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{2:1--2:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.2},
  URN =		{urn:nbn:de:0030-drops-143552},
  doi =		{10.4230/LIPIcs.WABI.2021.2},
  annote =	{Keywords: Private Genome Sequence Search, Secure Multiparty Computation, Secret Sharing, FM-index, Suffix Tree, Maximal Exact Match}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.3

Making Sense of a Cophylogeny Output: Efficient Listing of Representative Reconciliations

Authors: Yishu Wang, Arnaud Mary, Marie-France Sagot, and Blerina Sinaimeri

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

Cophylogeny reconciliation is a powerful method for analyzing host-parasite (or host-symbiont) co-evolution. It models co-evolution as an optimization problem where the set of all optimal solutions may represent different biological scenarios which thus need to be analyzed separately. Despite the significant research done in the area, few approaches have addressed the problem of helping the biologist deal with the often huge space of optimal solutions. In this paper, we propose a new approach to tackle this problem. We introduce three different criteria under which two solutions may be considered biologically equivalent, and then we propose polynomial-delay algorithms that enumerate only one representative per equivalence class (without listing all the solutions). Our results are of both theoretical and practical importance. Indeed, as shown by the experiments, we are able to significantly reduce the space of optimal solutions while still maintaining important biological information about the whole space.

Cite as

Yishu Wang, Arnaud Mary, Marie-France Sagot, and Blerina Sinaimeri. Making Sense of a Cophylogeny Output: Efficient Listing of Representative Reconciliations. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 3:1-3:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{wang_et_al:LIPIcs.WABI.2021.3,
  author =	{Wang, Yishu and Mary, Arnaud and Sagot, Marie-France and Sinaimeri, Blerina},
  title =	{{Making Sense of a Cophylogeny Output: Efficient Listing of Representative Reconciliations}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{3:1--3:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.3},
  URN =		{urn:nbn:de:0030-drops-143564},
  doi =		{10.4230/LIPIcs.WABI.2021.3},
  annote =	{Keywords: Cophylogeny, Enumeration, Equivalence relation, Dynamic programming}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.4

Perplexity: Evaluating Transcript Abundance Estimation in the Absence of Ground Truth

Authors: Jason Fan, Skylar Chan, and Rob Patro

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Although hyperparameter selection is known to affect the distributions of inferred abundances (e.g. producing smooth versus sparse estimates), strategies for performing model selection in experimental data have been addressed informally at best. Thus, we derive perplexity for evaluating abundance estimates on fragment sets directly. We adapt perplexity from the analogous metric used to evaluate language and topic models and extend the metric to carefully account for corner cases unique to RNA-seq. In experimental data, estimates with the best perplexity also best correlate with qPCR measurements. In simulated data, perplexity is well behaved and concordant with genome-wide measurements against ground truth and differential expression analysis. To our knowledge, our study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth.

Cite as

Jason Fan, Skylar Chan, and Rob Patro. Perplexity: Evaluating Transcript Abundance Estimation in the Absence of Ground Truth. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 4:1-4:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{fan_et_al:LIPIcs.WABI.2021.4,
  author =	{Fan, Jason and Chan, Skylar and Patro, Rob},
  title =	{{Perplexity: Evaluating Transcript Abundance Estimation in the Absence of Ground Truth}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{4:1--4:22},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.4},
  URN =		{urn:nbn:de:0030-drops-143578},
  doi =		{10.4230/LIPIcs.WABI.2021.4},
  annote =	{Keywords: RNA-seq, transcript abundance estimation, model selection}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.5

The Maximum Duo-Preservation String Mapping Problem with Bounded Alphabet

Authors: Nicolas Boria, Laurent Gourvès, Vangelis Th. Paschos, and Jérôme Monnot

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

Given two strings A and B such that B is a permutation of A, the max duo-preservation string mapping (MPSM) problem asks to find a mapping π between them so as to preserve a maximum number of duos. A duo is any pair of consecutive characters in a string and it is preserved by π if its two consecutive characters in A are mapped to same two consecutive characters in B. This problem has received a growing attention in recent years, partly as an alternative way to produce approximation algorithms for its minimization counterpart, min common string partition, a widely studied problem due its applications in comparative genomics. Considering this favored field of application with short alphabet, it is surprising that MPSM^𝓁, the variant of MPSM with bounded alphabet, has received so little attention, with a single yet impressive work that provides a 2.67-approximation achieved in O(n) [Brubach, 2018], where n = |A| = |B|. Our work focuses on MPSM^𝓁, and our main contribution is the demonstration that this problem admits a Polynomial Time Approximation Scheme (PTAS) when 𝓁 = O(1). We also provide an alternate, somewhat simpler, proof of NP-hardness for this problem compared with the NP-hardness proof presented in [Haitao Jiang et al., 2012].

Cite as

Nicolas Boria, Laurent Gourvès, Vangelis Th. Paschos, and Jérôme Monnot. The Maximum Duo-Preservation String Mapping Problem with Bounded Alphabet. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 5:1-5:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{boria_et_al:LIPIcs.WABI.2021.5,
  author =	{Boria, Nicolas and Gourv\`{e}s, Laurent and Paschos, Vangelis Th. and Monnot, J\'{e}r\^{o}me},
  title =	{{The Maximum Duo-Preservation String Mapping Problem with Bounded Alphabet}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{5:1--5:12},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.5},
  URN =		{urn:nbn:de:0030-drops-143586},
  doi =		{10.4230/LIPIcs.WABI.2021.5},
  annote =	{Keywords: Maximum-Duo Preservation String Mapping, Bounded alphabet, Polynomial Time Approximation Scheme}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.6

Treewidth-Based Algorithms for the Small Parsimony Problem on Networks

Authors: Celine Scornavacca and Mathias Weller

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the Small Parsimony problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different states. While this problem is polynomial-time solvable on trees, the matter is more complicated if T contains reticulate events such as hybridizations or recombinations, i.e. when T is a network. Indeed, three different versions of the parsimony score on networks have been proposed and each of them is NP-hard to decide. Existing parameterized algorithms focus on combining the number of possible character-states with the number of reticulate events (per biconnected component). Here, we consider the treewidth of the undirected graph underlying the input network as parameter, presenting dynamic programming algorithms for (slight generalizations of) all three versions of the parsimony problem on networks. Our algorithms use a formulation of the treewidth that may facilitate formalizing treewidth-based dynamic programming algorithms on phylogenetic networks for other problems.

Cite as

Celine Scornavacca and Mathias Weller. Treewidth-Based Algorithms for the Small Parsimony Problem on Networks. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 6:1-6:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{scornavacca_et_al:LIPIcs.WABI.2021.6,
  author =	{Scornavacca, Celine and Weller, Mathias},
  title =	{{Treewidth-Based Algorithms for the Small Parsimony Problem on Networks}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{6:1--6:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.6},
  URN =		{urn:nbn:de:0030-drops-143591},
  doi =		{10.4230/LIPIcs.WABI.2021.6},
  annote =	{Keywords: Phylogenetics, parsimony, phylogenetic networks, parameterized complexity, dynamic programming, treewidth}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.7

Tree Diet: Reducing the Treewidth to Unlock FPT Algorithms in RNA Bioinformatics

Authors: Bertrand Marchand, Yann Ponty, and Laurent Bulteau

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

Hard graph problems are ubiquitous in Bioinformatics, inspiring the design of specialized Fixed-Parameter Tractable algorithms, many of which rely on a combination of tree-decomposition and dynamic programming. The time/space complexities of such approaches hinge critically on low values for the treewidth tw of the input graph. In order to extend their scope of applicability, we introduce the Tree-Diet problem, i.e. the removal of a minimal set of edges such that a given tree-decomposition can be slimmed down to a prescribed treewidth tw'. Our rationale is that the time gained thanks to a smaller treewidth in a parameterized algorithm compensates the extra post-processing needed to take deleted edges into account. Our core result is an FPT dynamic programming algorithm for Tree-Diet, using 2^{O(tw)}n time and space. We complement this result with parameterized complexity lower-bounds for stronger variants (e.g., NP-hardness when tw' or tw-tw' is constant). We propose a prototype implementation for our approach which we apply on difficult instances of selected RNA-based problems: RNA design, sequence-structure alignment, and search of pseudoknotted RNAs in genomes, revealing very encouraging results. This work paves the way for a wider adoption of tree-decomposition-based algorithms in Bioinformatics.

Cite as

Bertrand Marchand, Yann Ponty, and Laurent Bulteau. Tree Diet: Reducing the Treewidth to Unlock FPT Algorithms in RNA Bioinformatics. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 7:1-7:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{marchand_et_al:LIPIcs.WABI.2021.7,
  author =	{Marchand, Bertrand and Ponty, Yann and Bulteau, Laurent},
  title =	{{Tree Diet: Reducing the Treewidth to Unlock FPT Algorithms in RNA Bioinformatics}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{7:1--7:23},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.7},
  URN =		{urn:nbn:de:0030-drops-143604},
  doi =		{10.4230/LIPIcs.WABI.2021.7},
  annote =	{Keywords: RNA, treewidth, FPT algorithms, RNA design, structure-sequence alignment}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.8

Space-Efficient Representation of Genomic k-Mer Count Tables

Authors: Yoshihiro Shibuya, Djamal Belazzougui, and Gregory Kucherov

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

Motivation. k-mer counting is a common task in bioinformatic pipelines, with many dedicated tools available. Output formats could rely on quotienting to reduce the space of k-mers in hash tables, however counts are not usually stored in space-efficient formats. Overall, k-mer count tables for genomic data take a considerable space, easily reaching tens of GB. Furthermore, such tables do not support efficient random-access queries in general. Results. In this work, we design an efficient representation of k-mer count tables supporting fast random-access queries. We propose to apply Compressed Static Functions (CSFs), with space proportional to the empirical zero-order entropy of the counts. For very skewed distributions, like those of k-mer counts in whole genomes, the only currently available implementation of CSFs does not provide a compact enough representation. By adding a Bloom Filter to a CSF we obtain a Bloom-enhanced CSF (BCSF) effectively overcoming this limitation. Furthermore, by combining BCSFs with minimizer-based bucketing of k-mers, we build even smaller representations breaking the empirical entropy lower bound, for large enough k. We also extend these representations to the approximate case, gaining additional space. We experimentally validate these techniques on k-mer count tables of whole genomes (E.Coli and C.Elegans) as well as on k-mer document frequency tables for 29 E.Coli genomes. In the case of exact counts, our representation takes about a half of the space of the empirical entropy, for large enough k’s.

Cite as

Yoshihiro Shibuya, Djamal Belazzougui, and Gregory Kucherov. Space-Efficient Representation of Genomic k-Mer Count Tables. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 8:1-8:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{shibuya_et_al:LIPIcs.WABI.2021.8,
  author =	{Shibuya, Yoshihiro and Belazzougui, Djamal and Kucherov, Gregory},
  title =	{{Space-Efficient Representation of Genomic k-Mer Count Tables}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{8:1--8:19},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.8},
  URN =		{urn:nbn:de:0030-drops-143619},
  doi =		{10.4230/LIPIcs.WABI.2021.8},
  annote =	{Keywords: k-mer counting, data structures, compression, minimizers, compressed static function, Bloom filter, empirical entropy}
}

Document

DOI: 10.4230/LIPIcs.WABI.2021.9

Parsimonious Clone Tree Reconciliation in Cancer

Authors: Palash Sashittal, Simone Zaccaria, and Mohammed El-Kebir

Published in: LIPIcs, Volume 201, 21st International Workshop on Algorithms in Bioinformatics (WABI 2021)

Abstract

Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from DNA sequencing data. However, due to technological and methodological limitations, current analyses are restricted to identifying tumor clones only based on either SNVs or CNAs, preventing a comprehensive characterization of a tumor’s clonal composition. To overcome these challenges, we formulate the identification of clones in terms of both SNVs and CNAs as a reconciliation problem while accounting for uncertainty in the input SNV and CNA proportions. We thus characterize the computational complexity of this problem and we introduce a mixed integer linear programming formulation to solve it exactly. On simulated data, we show that tumor clones can be identified reliably, especially when further taking into account the ancestral relationships that can be inferred from the input SNVs and CNAs. On 49 tumor samples from 10 prostate cancer patients, our reconciliation approach provides a higher resolution view of tumor evolution than previous studies.

Cite as

Palash Sashittal, Simone Zaccaria, and Mohammed El-Kebir. Parsimonious Clone Tree Reconciliation in Cancer. In 21st International Workshop on Algorithms in Bioinformatics (WABI 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 201, pp. 9:1-9:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{sashittal_et_al:LIPIcs.WABI.2021.9,
  author =	{Sashittal, Palash and Zaccaria, Simone and El-Kebir, Mohammed},
  title =	{{Parsimonious Clone Tree Reconciliation in Cancer}},
  booktitle =	{21st International Workshop on Algorithms in Bioinformatics (WABI 2021)},
  pages =	{9:1--9:21},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-200-6},
  ISSN =	{1868-8969},
  year =	{2021},
  volume =	{201},
  editor =	{Carbone, Alessandra and El-Kebir, Mohammed},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2021.9},
  URN =		{urn:nbn:de:0030-drops-143624},
  doi =		{10.4230/LIPIcs.WABI.2021.9},
  annote =	{Keywords: Intra-tumor heterogeneity, phylogenetics, mixed integer linear programming}
}

29 Search Results for "El-Kebir, Mohammed"

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as