Document

**Published in:** LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)

Motivation: The prediction of drug resistance and the identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Modern methods based on testing against a catalogue of previously identified mutations often yield poor predictive performance. On the other hand, machine learning techniques have demonstrated high predictive accuracy, but many of them lack interpretability to aid in identifying specific mutations which lead to resistance. We propose a novel technique, inspired by the group testing problem and Boolean compressed sensing, which yields highly accurate predictions and interpretable results at the same time.
Results: We develop a modified version of the Boolean compressed sensing problem for identifying drug resistance, and implement its formulation as an integer linear program. This allows us to characterize the predictive accuracy of the technique and select an appropriate metric to optimize. A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of our model under different regimes. We test the predictive accuracy of our approach on a variety of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to that of standard machine learning models and points to several genes with previously identified association to drug resistance.

Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, and Leonid Chindelevitch. An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 2:1-2:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{zabeti_et_al:LIPIcs.WABI.2020.2, author = {Zabeti, Hooman and Dexter, Nick and Safari, Amir Hosein and Sedaghat, Nafiseh and Libbrecht, Maxwell and Chindelevitch, Leonid}, title = {{An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis}}, booktitle = {20th International Workshop on Algorithms in Bioinformatics (WABI 2020)}, pages = {2:1--2:18}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-161-0}, ISSN = {1868-8969}, year = {2020}, volume = {172}, editor = {Kingsford, Carl and Pisanti, Nadia}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.2}, URN = {urn:nbn:de:0030-drops-127911}, doi = {10.4230/LIPIcs.WABI.2020.2}, annote = {Keywords: Drug resistance, whole-genome sequencing, interpretable machine learning, integer linear programming, rule-based learning} }

Document

**Published in:** LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)

Elemental balance, the property of having the same number of each type of atom on both sides of the equation, is a fundamental feature of chemical reactions. In metabolic network models, this property is typically verified on a reaction-by-reaction basis. In this paper we show how violations of elemental balance can be efficiently detected in an entire network, without the need for specifying the chemical formula of each of the metabolites, which enhances a modeler's ability to automatically verify that their model satisfies elemental balance.
Our method makes use of duality theory, linear programming, and mixed integer linear programming, and runs efficiently on genome-scale metabolic networks (GSMNs). We detect elemental balance violations in 40 out of 84 metabolic network models in the BiGG database. We also identify a short list of reactions that are candidates for being elementally imbalanced. Out of these candidates, nearly half turn out to be truly imbalanced reactions, and the rest can be seen as witnesses of elemental balance violations elsewhere in the network. The majority of these violations involve a proton imbalance, a known challenge of metabolic network reconstruction.
Our approach is efficient, easy to use and powerful. It can be helpful to metabolic network modelers during model verification. Our methods are fully integrated into the MONGOOSE software suite and are available at https://github.com/WGS-TB/MongooseGUI3.

Hooman Zabeti, Tamon Stephen, Bonnie Berger, and Leonid Chindelevitch. A Duality-Based Method for Identifying Elemental Balance Violations in Metabolic Network Models. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 1:1-1:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{zabeti_et_al:LIPIcs.WABI.2018.1, author = {Zabeti, Hooman and Stephen, Tamon and Berger, Bonnie and Chindelevitch, Leonid}, title = {{A Duality-Based Method for Identifying Elemental Balance Violations in Metabolic Network Models}}, booktitle = {18th International Workshop on Algorithms in Bioinformatics (WABI 2018)}, pages = {1:1--1:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-082-8}, ISSN = {1868-8969}, year = {2018}, volume = {113}, editor = {Parida, Laxmi and Ukkonen, Esko}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.1}, URN = {urn:nbn:de:0030-drops-93034}, doi = {10.4230/LIPIcs.WABI.2018.1}, annote = {Keywords: Metabolic network analysis, elemental imbalance, linear programming, model verification} }

Document

**Published in:** LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)

Variable-Number Tandem Repeats (VNTR) are genomic regions where a short sequence of DNA is repeated with no space in between repeats. While a fixed set of VNTRs is typically identified for a given species, the copy number at each VNTR varies between individuals within a species. Although VNTRs are found in both prokaryotic and eukaryotic genomes, the methodology called multi-locus VNTR analysis (MLVA) is widely used to distinguish different strains of bacteria, as well as cluster strains that might be epidemiologically related and investigate evolutionary rates.
We propose PRINCE (Processing Reads to Infer the Number of Copies via Estimation), an algorithm that is able to accurately estimate the copy number of a VNTR given the sequence of a single repeat unit and a set of short reads from a whole-genome sequence (WGS) experiment. This is a challenging problem, especially in the cases when the repeat region is longer than the expected read length. Our proposed method computes a statistical approximation of the local coverage inside the repeat region. This approximation is then mapped to the copy number using a linear function whose parameters are fitted to simulated data. We test PRINCE on the genomes of three datasets of Mycobacterium tuberculosis strains and show that it is more than twice as accurate as a previous method.
An implementation of PRINCE in the Python language is freely available at https://github.com/WGS-TB/PythonPRINCE.

Mehrdad Mansouri, Julian Booth, Margaryta Vityaz, Cedric Chauve, and Leonid Chindelevitch. PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 20:1-20:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{mansouri_et_al:LIPIcs.WABI.2018.20, author = {Mansouri, Mehrdad and Booth, Julian and Vityaz, Margaryta and Chauve, Cedric and Chindelevitch, Leonid}, title = {{PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats}}, booktitle = {18th International Workshop on Algorithms in Bioinformatics (WABI 2018)}, pages = {20:1--20:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-082-8}, ISSN = {1868-8969}, year = {2018}, volume = {113}, editor = {Parida, Laxmi and Ukkonen, Esko}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.20}, URN = {urn:nbn:de:0030-drops-93227}, doi = {10.4230/LIPIcs.WABI.2018.20}, annote = {Keywords: Variable-Number Tandem Repeats, Copy number, Bacterial genomics} }

Document

**Published in:** LIPIcs, Volume 103, 17th International Symposium on Experimental Algorithms (SEA 2018)

The problem of computing the dual of a monotone Boolean function f is a fundamental problem in theoretical computer science with numerous applications. The related problem of duality testing (given two monotone Boolean functions f and g, declare that they are dual or provide a certificate that shows they are not) has a complexity that is not yet known. However, two quasi-polynomial time algorithms for it, often referred to as FK-A and FK-B, were proposed by Fredman and Khachiyan in 1996, with the latter having a better complexity guarantee. These can be naturally used as a subroutine in computing the dual of f.
In this paper, we investigate this use of the FK-B algorithm for the computation of the dual of a monotone Boolean function, and present practical improvements to its performance. First, we show how FK-B can be modified to produce multiple certificates (Boolean vectors on which the functions defined by the original f and the current dual g do not provide outputs consistent with duality). Second, we show how the number of redundancy tests - one of the more costly and time-consuming steps of FK-B - can be substantially reduced in this context. Lastly, we describe a simple memoization technique that avoids the solution of multiple identical subproblems.
We test our approach on a number of inputs coming from computational biology as well as combinatorics. These modifications provide a substantial speed-up, as much as an order of magnitude, for FK-B dualization relative to a naive implementation. Although other methods may end up being faster in practice, our work paves the way for a principled optimization process for the generation of monotone Boolean functions and their duals from an oracle.

Nafiseh Sedaghat, Tamon Stephen, and Leonid Chindelevitch. Speeding up Dualization in the Fredman-Khachiyan Algorithm B. In 17th International Symposium on Experimental Algorithms (SEA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 103, pp. 6:1-6:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{sedaghat_et_al:LIPIcs.SEA.2018.6, author = {Sedaghat, Nafiseh and Stephen, Tamon and Chindelevitch, Leonid}, title = {{Speeding up Dualization in the Fredman-Khachiyan Algorithm B}}, booktitle = {17th International Symposium on Experimental Algorithms (SEA 2018)}, pages = {6:1--6:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-070-5}, ISSN = {1868-8969}, year = {2018}, volume = {103}, editor = {D'Angelo, Gianlorenzo}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.SEA.2018.6}, URN = {urn:nbn:de:0030-drops-89413}, doi = {10.4230/LIPIcs.SEA.2018.6}, annote = {Keywords: Monotone boolean functions, dualization, Fredman-Khachiyan algorithm, algorithm engineering, metabolic networks} }