DROPS

Document

DOI: 10.4230/LIPIcs.WABI.2022.13

A Linear Time Algorithm for an Extended Version of the Breakpoint Double Distance

Authors: Marília D. V. Braga, Leonie R. Brockmann, Katharina Klerx, and Jens Stoye

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)

Abstract

Two genomes over the same set of gene families form a canonical pair when each of them has exactly one gene from each family. A genome is circular when it contains only circular chromosomes. Different distances of canonical circular genomes can be derived from a structure called breakpoint graph, which represents the relation between the two given genomes as a collection of cycles of even length. Then, the breakpoint distance is equal to n-c_2, where n is the number of genes and c_2 is the number of cycles of length 2. Similarly, when the considered rearrangements are those modeled by the double-cut-and-join (DCJ) operation, the rearrangement distance is n-c, where c is the total number of cycles. The distance problem is a basic unit for several other combinatorial problems related to genome evolution and ancestral reconstruction, such as median or double distance. Interestingly, both median and double distance problems can be solved in polynomial time for the breakpoint distance, while they are NP-hard for the rearrangement distance. One way of exploring the complexity space between these two extremes is to consider a σ_k distance, defined to be n-(c_2+c_4+…+c_k), and increasingly investigate the complexities of median and double distance for the σ₄ distance, then the σ₆ distance, and so on. While for the median much effort was done in our and in other research groups but no progress was obtained even for the σ₄ distance, for solving the double distance under σ₄ and σ₆ distances we could devise linear time algorithms, which we present here.

Cite as

Marília D. V. Braga, Leonie R. Brockmann, Katharina Klerx, and Jens Stoye. A Linear Time Algorithm for an Extended Version of the Breakpoint Double Distance. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 13:1-13:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{braga_et_al:LIPIcs.WABI.2022.13,
  author =	{Braga, Mar{\'\i}lia D. V. and Brockmann, Leonie R. and Klerx, Katharina and Stoye, Jens},
  title =	{{A Linear Time Algorithm for an Extended Version of the Breakpoint Double Distance}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{13:1--13:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.13},
  URN =		{urn:nbn:de:0030-drops-170472},
  doi =		{10.4230/LIPIcs.WABI.2022.13},
  annote =	{Keywords: Comparative genomics, genome rearrangement, breakpoint distance, double-cut-and-join (DCJ) distance, double distance}
}

Document

DOI: 10.4230/LIPIcs.WABI.2020.2

An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis

Authors: Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, and Leonid Chindelevitch

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)

Abstract

Motivation: The prediction of drug resistance and the identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Modern methods based on testing against a catalogue of previously identified mutations often yield poor predictive performance. On the other hand, machine learning techniques have demonstrated high predictive accuracy, but many of them lack interpretability to aid in identifying specific mutations which lead to resistance. We propose a novel technique, inspired by the group testing problem and Boolean compressed sensing, which yields highly accurate predictions and interpretable results at the same time. Results: We develop a modified version of the Boolean compressed sensing problem for identifying drug resistance, and implement its formulation as an integer linear program. This allows us to characterize the predictive accuracy of the technique and select an appropriate metric to optimize. A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of our model under different regimes. We test the predictive accuracy of our approach on a variety of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to that of standard machine learning models and points to several genes with previously identified association to drug resistance.

Cite as

Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, and Leonid Chindelevitch. An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 2:1-2:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{zabeti_et_al:LIPIcs.WABI.2020.2,
  author =	{Zabeti, Hooman and Dexter, Nick and Safari, Amir Hosein and Sedaghat, Nafiseh and Libbrecht, Maxwell and Chindelevitch, Leonid},
  title =	{{An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{2:1--2:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.2},
  URN =		{urn:nbn:de:0030-drops-127911},
  doi =		{10.4230/LIPIcs.WABI.2020.2},
  annote =	{Keywords: Drug resistance, whole-genome sequencing, interpretable machine learning, integer linear programming, rule-based learning}
}

@InProceedings{zabeti_et_al:LIPIcs.WABI.2020.2,
  author =	{Zabeti, Hooman and Dexter, Nick and Safari, Amir Hosein and Sedaghat, Nafiseh and Libbrecht, Maxwell and Chindelevitch, Leonid},
  title =	{{An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{2:1--2:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.2},
  URN =		{urn:nbn:de:0030-drops-127911},
  doi =		{10.4230/LIPIcs.WABI.2020.2},
  annote =	{Keywords: Drug resistance, whole-genome sequencing, interpretable machine learning, integer linear programming, rule-based learning}
}

Document

DOI: 10.4230/LIPIcs.WABI.2020.11

A Graph-Theoretic Barcode Ordering Model for Linked-Reads

Authors: Yoann Dufresne, Chen Sun, Pierre Marijon, Dominique Lavenier, Cedric Chauve, and Rayan Chikhi

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)

Abstract

Considering a set of intervals on the real line, an interval graph records these intervals as nodes and their intersections as edges. Identifying (i.e. merging) pairs of nodes in an interval graph results in a multiple-interval graph. Given only the nodes and the edges of the multiple-interval graph without knowing the underlying intervals, we are interested in the following questions. Can one determine how many intervals correspond to each node? Can one compute a walk over the multiple-interval graph nodes that reflects the ordering of the original intervals? These questions are closely related to linked-read DNA sequencing, where barcodes are assigned to long molecules whose intersection graph forms an interval graph. Each barcode may correspond to multiple molecules, which complicates downstream analysis, and corresponds to the identification of nodes of the corresponding interval graph. Resolving the above graph-theoretic problems would facilitate analyses of linked-reads sequencing data, through enabling the conceptual separation of barcodes into molecules and providing, through the molecules order, a skeleton for accurately assembling the genome. Here, we propose a framework that takes as input an arbitrary intersection graph (such as an overlap graph of barcodes) and constructs a heuristic approximation of the ordering of the original intervals.

Cite as

Yoann Dufresne, Chen Sun, Pierre Marijon, Dominique Lavenier, Cedric Chauve, and Rayan Chikhi. A Graph-Theoretic Barcode Ordering Model for Linked-Reads. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 11:1-11:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{dufresne_et_al:LIPIcs.WABI.2020.11,
  author =	{Dufresne, Yoann and Sun, Chen and Marijon, Pierre and Lavenier, Dominique and Chauve, Cedric and Chikhi, Rayan},
  title =	{{A Graph-Theoretic Barcode Ordering Model for Linked-Reads}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{11:1--11:17},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.11},
  URN =		{urn:nbn:de:0030-drops-128001},
  doi =		{10.4230/LIPIcs.WABI.2020.11},
  annote =	{Keywords: DNA sequencing, graph algorithms, linked-reads, interval graphs, cliques}
}

Document

DOI: 10.4230/LIPIcs.WABI.2018.20

PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats

Authors: Mehrdad Mansouri, Julian Booth, Margaryta Vityaz, Cedric Chauve, and Leonid Chindelevitch

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)

Abstract

Variable-Number Tandem Repeats (VNTR) are genomic regions where a short sequence of DNA is repeated with no space in between repeats. While a fixed set of VNTRs is typically identified for a given species, the copy number at each VNTR varies between individuals within a species. Although VNTRs are found in both prokaryotic and eukaryotic genomes, the methodology called multi-locus VNTR analysis (MLVA) is widely used to distinguish different strains of bacteria, as well as cluster strains that might be epidemiologically related and investigate evolutionary rates. We propose PRINCE (Processing Reads to Infer the Number of Copies via Estimation), an algorithm that is able to accurately estimate the copy number of a VNTR given the sequence of a single repeat unit and a set of short reads from a whole-genome sequence (WGS) experiment. This is a challenging problem, especially in the cases when the repeat region is longer than the expected read length. Our proposed method computes a statistical approximation of the local coverage inside the repeat region. This approximation is then mapped to the copy number using a linear function whose parameters are fitted to simulated data. We test PRINCE on the genomes of three datasets of Mycobacterium tuberculosis strains and show that it is more than twice as accurate as a previous method. An implementation of PRINCE in the Python language is freely available at https://github.com/WGS-TB/PythonPRINCE.

Cite as

Mehrdad Mansouri, Julian Booth, Margaryta Vityaz, Cedric Chauve, and Leonid Chindelevitch. PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 20:1-20:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{mansouri_et_al:LIPIcs.WABI.2018.20,
  author =	{Mansouri, Mehrdad and Booth, Julian and Vityaz, Margaryta and Chauve, Cedric and Chindelevitch, Leonid},
  title =	{{PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{20:1--20:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.20},
  URN =		{urn:nbn:de:0030-drops-93227},
  doi =		{10.4230/LIPIcs.WABI.2018.20},
  annote =	{Keywords: Variable-Number Tandem Repeats, Copy number, Bacterial genomics}
}

4 Search Results for "Chauve, Cedric"

A Linear Time Algorithm for an Extended Version of the Breakpoint Double Distance

Abstract

Cite as

An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis

Abstract

Cite as

A Graph-Theoretic Barcode Ordering Model for Linked-Reads

Abstract

Cite as

PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats

Abstract

Cite as

Thanks for your feedback!

Could not send message