An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis

Authors Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, Leonid Chindelevitch

Thumbnail PDF


  • Filesize: 0.56 MB
  • 18 pages

Document Identifiers

Author Details

Hooman Zabeti
  • School of Computing Science, Simon Fraser University, Burnaby, Canada
Nick Dexter
  • Department of Mathematics, Simon Fraser University, Burnaby, Canada
Amir Hosein Safari
  • School of Computing Science, Simon Fraser University, Burnaby, Canada
Nafiseh Sedaghat
  • School of Computing Science, Simon Fraser University, Burnaby, Canada
Maxwell Libbrecht
  • School of Computing Science, Simon Fraser University, Burnaby, Canada
Leonid Chindelevitch
  • School of Computing Science, Simon Fraser University, Burnaby, Canada


The authors would like to thank Dr. Cedric Chauve, Dr. Ben Adcock and Matthew Nguyen for helpful discussions.

Cite AsGet BibTex

Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, and Leonid Chindelevitch. An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 2:1-2:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Motivation: The prediction of drug resistance and the identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Modern methods based on testing against a catalogue of previously identified mutations often yield poor predictive performance. On the other hand, machine learning techniques have demonstrated high predictive accuracy, but many of them lack interpretability to aid in identifying specific mutations which lead to resistance. We propose a novel technique, inspired by the group testing problem and Boolean compressed sensing, which yields highly accurate predictions and interpretable results at the same time. Results: We develop a modified version of the Boolean compressed sensing problem for identifying drug resistance, and implement its formulation as an integer linear program. This allows us to characterize the predictive accuracy of the technique and select an appropriate metric to optimize. A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of our model under different regimes. We test the predictive accuracy of our approach on a variety of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to that of standard machine learning models and points to several genes with previously identified association to drug resistance.

Subject Classification

ACM Subject Classification
  • Applied computing
  • Applied computing → Bioinformatics
  • Applied computing → Molecular sequence analysis
  • Drug resistance
  • whole-genome sequencing
  • interpretable machine learning
  • integer linear programming
  • rule-based learning


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Matthew Aldridge, Oliver Johnson, Jonathan Scarlett, et al. Group testing: an information theory perspective. Foundations and Trendsregistered in Communications and Information Theory, 15(3-4):196-392, 2019. Google Scholar
  2. Gustavo Arango-Argoty, Emily Garner, Amy Pruden, Lenwood S Heath, Peter Vikesland, and Liqing Zhang. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome, 6(1):1-15, 2018. Google Scholar
  3. G. K. Atia and V. Saligrama. Boolean compressed sensing and noisy group testing. IEEE Transactions on Information Theory, 58(3):1880-1901, 2012. Google Scholar
  4. P. Bradley, N. Gordon, T Walker, et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nature Communications, 6, 2015. Google Scholar
  5. E. J. Candes and M. B. Wakin. An introduction to compressive sampling. IEEE Signal Processing Magazine, 25(2):21-30, 2008. Google Scholar
  6. Albert Cohen, Wolfgang Dahmen, and Ronald DeVore. Compressed sensing and best k-term approximation. Journal of the American mathematical society, 22(1):211-231, 2009. Google Scholar
  7. Francesc Coll, Ruth McNerney, José Afonso Guerra-Assunção, Judith R. Glynn, João Perdigão, Miguel Viveiros, Isabel Portugal, Arnab Pain, Nigel Martin, and Taane G. Clark. A robust snp barcode for typing mycobacterium tuberculosis complex strains. Nature Communications, 2014. URL:
  8. Wouter Deelder, Sofia Christakoudi, Jody Phelan, Ernest Diez Benavente, Susana Campino, Ruth McNerney, Luigi Palla, and Taane Gregory Clark. Machine learning predicts accurately Mycobacterium tuberculosis drug resistance from whole genome sequencing data. Frontiers in Genetics, 10:922, 2019. Google Scholar
  9. Alireza Doostan and Houman Owhadi. A non-adapted sparse approximation of PDEs with stochastic inputs. Journal of Computational Physics, 230(8):3015-3034, 2011. Google Scholar
  10. Robert Dorfman. The detection of defective members of large populations. The Annals of Mathematical Statistics, 14(4):436-440, 1943. Google Scholar
  11. Sorin Drăghici and R Brian Potter. Predicting HIV drug resistance with neural networks. Bioinformatics, 19(1):98-107, 2003. Google Scholar
  12. M. F. Duarte and Y. C. Eldar. Structured compressed sensing: From theory to applications. IEEE Transactions on Signal Processing, 59(9):4053-4085, 2011. Google Scholar
  13. Y.C. Eldar and G. Kutyniok. Compressed Sensing: Theory and Applications. Cambridge University Press, 2012. URL:
  14. Coll F, McNerney R, Preston MD, et al. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences. Genome Med., 7:51, 2015. Google Scholar
  15. Silke Feuerriegel, Viola Schleusener, Patrick Beckert, Thomas A. Kohl, Paolo Miotto, Daniela M. Cirillo, Andrea M. Cabibbe, Stefan Niemann, and Kurt Fellenberg. PhyResSE: a web tool delineating Mycobacterium tuberculosis antibiotic resistance and lineage from whole-genome sequencing data. Journal of Clinical Microbiology, 53(6):1908-1914, 2015. Google Scholar
  16. S. Foucart and H. Rauhut. A Mathematical Introduction to Compressive Sensing. Applied and Numerical Harmonic Analysis. Springer New York, 2013. URL:
  17. Sebastien Gagneux. Ecology and evolution of Mycobacterium tuberculosis. Nat Rev Microbiol, 16:202-213, 2018. Google Scholar
  18. Li H1, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and samtools. Bioinformatics, 25(16):2078-2079, 2009. Google Scholar
  19. Matthew A Herman and Thomas Strohmer. High-resolution radar via compressed sensing. IEEE transactions on signal processing, 57(6):2275-2284, 2009. Google Scholar
  20. IBM. IBM ILOG CPLEX Optimization Studio V12.10.0 documentation., 2020.
  21. H. Iwai, M. Kato-Miyazawa, T Kirikae, and T. Miyoshi-Akiyama. CASTB (the comprehensive analysis server for the Mycobacterium tuberculosis complex): A publicly accessible web server for epidemiological analyses, drug-resistance prediction and phylogenetic comparison of clinical isolates. Tuberculosis, pages 843-844, 2015. Google Scholar
  22. Suha Kadura, Nicholas King, Maria Nakhoul, Hongya Zhu, Grant Theron, Claudio U Köser, and Maha Farhat. Systematic review of mutations associated with resistance to the new and repurposed Mycobacterium tuberculosis drugs bedaquiline, clofazimine, linezolid, delamanid and pretomanid. Journal of Antimicrobial Chemotherapy, May 2020. dkaa136. Google Scholar
  23. Rasko Leinonen, Ruth Akhtar, Ewan Birney, Lawrence Bower, Ana Cerdeno-Tárraga, et al. The European Nucleotide Archive. Nucleic Acids Research, 39:D28-31, 2011. Google Scholar
  24. Rasko Leinonen, Hideaki Sugawara, Martin Shumway, and International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic acids research, 39(suppl_1):D19-D21, 2010. Google Scholar
  25. H Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv, 2013. Google Scholar
  26. Michael Lustig, David Donoho, and John M Pauly. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, 58(6):1182-1195, 2007. Google Scholar
  27. D. Malioutov and M. Malyutov. Boolean compressed sensing: LP relaxation for group testing. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3305-3308, 2012. Google Scholar
  28. Dmitry Malioutov and Kush Varshney. Exact rule learning via Boolean compressed sensing. In International Conference on Machine Learning, pages 765-773, 2013. Google Scholar
  29. L Mathelin and KA Gallivan. A compressed sensing approach for partial differential equations with random input data. Communications in computational physics, 12(4):919-954, 2012. Google Scholar
  30. Arya Mazumdar. On almost disjunct matrices for group testing. In Kun-Mao Chao, Tsan-sheng Hsu, and Der-Tsai Lee, editors, Algorithms and Computation, pages 649-658, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. Google Scholar
  31. Paolo Miotto, Belay Tessema, Elisa Tagliani, Leonid Chindelevitch, et al. A standardised method for interpreting the association between mutations and phenotypic drug-resistance in Mycobacterium tuberculosis. European Respiratory Journal, 50(6), 2017. Google Scholar
  32. W James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Interpretable machine learning: definitions, methods, and applications. arXiv, 2019. Google Scholar
  33. Balas Kausik Natarajan. Sparse approximate solutions to linear systems. SIAM journal on computing, 24(2):227-234, 1995. Google Scholar
  34. Tra-My Ngo and Yik-Ying Teo. Genomic prediction of tuberculosis drug-resistance: benchmarking existing databases and prediction algorithms. BMC Bioinformatics, 20(1):68, 2019. Google Scholar
  35. Jim O'Neill. Antimicrobial resistance: Tackling a crisis for the health and wealth of nations. Technical report, Review on Antimicrobial Resistance, 2014. URL:
  36. World Health Organization. Antimicrobial resistance: global report on surveillance. Technical report, WHO, 2014. Google Scholar
  37. World Health Organization. Global tuberculosis report 2019. Technical report, WHO, 2019. Google Scholar
  38. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011. Google Scholar
  39. Ryan Poplin, Valentin Ruano-Rubio, Mark A. DePristo, Tim J. Fennell, Mauricio O. Carneiro, Geraldine A. Van der Auwera, David E. Kling, Laura D. Gauthier, Ami Levy-Moonshine, David Roazen, Khalid Shakir, Joel Thibault, Sheila Chandran, Chris Whelan, Monkol Lek, Stacey Gabriel, Mark J Daly, Ben Neale, Daniel G. MacArthur, and Eric Banks. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv, 2017. Google Scholar
  40. Mario C Raviglione and Ian M Smith. XDR tuberculosis—implications for global public health. New England Journal of Medicine, 356(7):656-659, 2007. Google Scholar
  41. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135-1144, 2016. Google Scholar
  42. James Emmanuel San, Shakuntala Baichoo, Aquillah Kanzi, Yumna Moosa, Richard Lessells, Vagner Fonseca, John Mogaka, Robert Power, and Tulio de Oliveira. Current affairs of microbial genome-wide association studies: Approaches, bottlenecks and analytical pitfalls. Frontiers in Microbiology, 10:3119, 2020. URL:
  43. V. Schleusener, C. Köser, P. Beckert, et al. Mycobacterium tuberculosis resistance prediction and lineage classification from genome sequencing: comparison of automated analysis tools. Scientific Reports, 7, 2017. Google Scholar
  44. Almeida Da Silva, Pedro Eduardo, Palomino, and Juan Carlos. Molecular basis and mechanisms of drug resistance in Mycobacterium tuberculosis: classical and new drugs. Journal of Antimicrobial Chemotherapy, 66(7):1417-1430, May 2011. Google Scholar
  45. Angela M Starks, Enrique Avilés, Daniela M Cirillo, Claudia M Denkinger, David L Dolinger, Claudia Emerson, Jim Gallarda, Debra Hanna, Peter S Kim, Richard Liwski, et al. Collaborative effort for a centralized worldwide tuberculosis relational sequencing data platform. Clinical Infectious Diseases, 61(suppl_3):S141-S146, 2015. Google Scholar
  46. A Steiner, D Stucki, M Coscolla, S Borrell, and S Gagneux. KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes. BMC Genomics, 15, 2014. Google Scholar
  47. Alice R Wattam, David Abraham, Oral Dalay, Terry L Disz, Timothy Driscoll, Joseph L Gabbard, Joseph J Gillespie, Roger Gough, Deborah Hix, Ronald Kenyon, et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic acids research, 42(D1):D581-D591, 2014. Google Scholar
  48. Yang Yang, Katherine E Niehaus, Timothy M Walker, Zamin Iqbal, A Sarah Walker, Daniel J Wilson, Tim EA Peto, Derrick W Crook, E Grace Smith, Tingting Zhu, et al. Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data. Bioinformatics, 34(10):1666-1671, 2018. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail