3 Search Results for "Zabeti, Hooman"


Document
Cosine Similarity Estimation Using FracMinHash: Theoretical Analysis, Safety Conditions, and Implementation

Authors: Mahmudur Rahman Hera and David Koslicki

Published in: LIPIcs, Volume 312, 24th International Workshop on Algorithms in Bioinformatics (WABI 2024)


Abstract
Motivation. The increasing number and volume of genomic and metagenomic data necessitates scalable and robust computational models for precise analysis. Sketching techniques utilizing k-mers from a biological sample have proven to be useful for large-scale analyses. In recent years, FracMinHash has emerged as a popular sketching technique and has been used in several useful applications. Recent studies on FracMinHash proved unbiased estimators for the containment and Jaccard indices. However, theoretical investigations for other metrics, such as the cosine similarity, are still lacking. Theoretical contributions. In this paper, we present a theoretical framework for estimating cosine similarity from FracMinHash sketches. We establish conditions under which this estimation is sound, and recommend a minimum scale factor s for accurate results. Experimental evidence supports our theoretical findings. Practical contributions. We also present frac-kmc, a fast and efficient FracMinHash sketch generator program. frac-kmc is the fastest known FracMinHash sketch generator, delivering accurate and precise results for cosine similarity estimation on real data. We show that by computing FracMinHash sketches using frac-kmc, we can estimate pairwise cosine similarity speedily and accurately on real data. frac-kmc is freely available here: https://github.com/KoslickiLab/frac-kmc/.

Cite as

Mahmudur Rahman Hera and David Koslicki. Cosine Similarity Estimation Using FracMinHash: Theoretical Analysis, Safety Conditions, and Implementation. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 6:1-6:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Copy BibTex To Clipboard

@InProceedings{hera_et_al:LIPIcs.WABI.2024.6,
  author =	{Hera, Mahmudur Rahman and Koslicki, David},
  title =	{{Cosine Similarity Estimation Using FracMinHash: Theoretical Analysis, Safety Conditions, and Implementation}},
  booktitle =	{24th International Workshop on Algorithms in Bioinformatics (WABI 2024)},
  pages =	{6:1--6:16},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-340-9},
  ISSN =	{1868-8969},
  year =	{2024},
  volume =	{312},
  editor =	{Pissis, Solon P. and Sung, Wing-Kin},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2024.6},
  URN =		{urn:nbn:de:0030-drops-206502},
  doi =		{10.4230/LIPIcs.WABI.2024.6},
  annote =	{Keywords: Hashing, sketching, FracMinHash, Min-Hash, k-mer, similarity, theory}
}
Document
An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis

Authors: Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, and Leonid Chindelevitch

Published in: LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)


Abstract
Motivation: The prediction of drug resistance and the identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Modern methods based on testing against a catalogue of previously identified mutations often yield poor predictive performance. On the other hand, machine learning techniques have demonstrated high predictive accuracy, but many of them lack interpretability to aid in identifying specific mutations which lead to resistance. We propose a novel technique, inspired by the group testing problem and Boolean compressed sensing, which yields highly accurate predictions and interpretable results at the same time. Results: We develop a modified version of the Boolean compressed sensing problem for identifying drug resistance, and implement its formulation as an integer linear program. This allows us to characterize the predictive accuracy of the technique and select an appropriate metric to optimize. A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of our model under different regimes. We test the predictive accuracy of our approach on a variety of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to that of standard machine learning models and points to several genes with previously identified association to drug resistance.

Cite as

Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, and Leonid Chindelevitch. An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 2:1-2:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)


Copy BibTex To Clipboard

@InProceedings{zabeti_et_al:LIPIcs.WABI.2020.2,
  author =	{Zabeti, Hooman and Dexter, Nick and Safari, Amir Hosein and Sedaghat, Nafiseh and Libbrecht, Maxwell and Chindelevitch, Leonid},
  title =	{{An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis}},
  booktitle =	{20th International Workshop on Algorithms in Bioinformatics (WABI 2020)},
  pages =	{2:1--2:18},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-161-0},
  ISSN =	{1868-8969},
  year =	{2020},
  volume =	{172},
  editor =	{Kingsford, Carl and Pisanti, Nadia},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.2},
  URN =		{urn:nbn:de:0030-drops-127911},
  doi =		{10.4230/LIPIcs.WABI.2020.2},
  annote =	{Keywords: Drug resistance, whole-genome sequencing, interpretable machine learning, integer linear programming, rule-based learning}
}
Document
A Duality-Based Method for Identifying Elemental Balance Violations in Metabolic Network Models

Authors: Hooman Zabeti, Tamon Stephen, Bonnie Berger, and Leonid Chindelevitch

Published in: LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)


Abstract
Elemental balance, the property of having the same number of each type of atom on both sides of the equation, is a fundamental feature of chemical reactions. In metabolic network models, this property is typically verified on a reaction-by-reaction basis. In this paper we show how violations of elemental balance can be efficiently detected in an entire network, without the need for specifying the chemical formula of each of the metabolites, which enhances a modeler's ability to automatically verify that their model satisfies elemental balance. Our method makes use of duality theory, linear programming, and mixed integer linear programming, and runs efficiently on genome-scale metabolic networks (GSMNs). We detect elemental balance violations in 40 out of 84 metabolic network models in the BiGG database. We also identify a short list of reactions that are candidates for being elementally imbalanced. Out of these candidates, nearly half turn out to be truly imbalanced reactions, and the rest can be seen as witnesses of elemental balance violations elsewhere in the network. The majority of these violations involve a proton imbalance, a known challenge of metabolic network reconstruction. Our approach is efficient, easy to use and powerful. It can be helpful to metabolic network modelers during model verification. Our methods are fully integrated into the MONGOOSE software suite and are available at https://github.com/WGS-TB/MongooseGUI3.

Cite as

Hooman Zabeti, Tamon Stephen, Bonnie Berger, and Leonid Chindelevitch. A Duality-Based Method for Identifying Elemental Balance Violations in Metabolic Network Models. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 1:1-1:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Copy BibTex To Clipboard

@InProceedings{zabeti_et_al:LIPIcs.WABI.2018.1,
  author =	{Zabeti, Hooman and Stephen, Tamon and Berger, Bonnie and Chindelevitch, Leonid},
  title =	{{A Duality-Based Method for Identifying Elemental Balance Violations in Metabolic Network Models}},
  booktitle =	{18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
  pages =	{1:1--1:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Parida, Laxmi and Ukkonen, Esko},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.1},
  URN =		{urn:nbn:de:0030-drops-93034},
  doi =		{10.4230/LIPIcs.WABI.2018.1},
  annote =	{Keywords: Metabolic network analysis, elemental imbalance, linear programming, model verification}
}
  • Refine by Author
  • 2 Chindelevitch, Leonid
  • 2 Zabeti, Hooman
  • 1 Berger, Bonnie
  • 1 Dexter, Nick
  • 1 Hera, Mahmudur Rahman
  • Show More...

  • Refine by Classification
  • 1 Applied computing
  • 1 Applied computing → Bioinformatics
  • 1 Applied computing → Biological networks
  • 1 Applied computing → Computational biology
  • 1 Applied computing → Molecular sequence analysis

  • Refine by Keyword
  • 1 Drug resistance
  • 1 FracMinHash
  • 1 Hashing
  • 1 Metabolic network analysis
  • 1 Min-Hash
  • Show More...

  • Refine by Type
  • 3 document

  • Refine by Publication Year
  • 1 2018
  • 1 2020
  • 1 2024

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail