3 Search Results for "Mirarab, Siavash"


Document
A k-mer-Based Estimator of the Substitution Rate Between Repetitive Sequences

Authors: Haonan Wu, Antonio Blanca, and Paul Medvedev

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
K-mer-based analysis of genomic data is ubiquitous, but the presence of repetitive k-mers continues to pose problems for the accuracy of many methods. For example, the Mash tool (Ondov et al. 2016) can accurately estimate the substitution rate between two low-repetitive sequences from their k-mer sketches; however, it is inaccurate on repetitive sequences such as the centromere of a human chromosome. Follow-up work by Blanca et al. (2021) has attempted to model how mutations affect k-mer sets based on strong assumptions that the sequence is non-repetitive and that mutations do not create spurious k-mer matches. However, the theoretical foundations for extending an estimator like Mash to work in the presence of repeat sequences have been lacking. In this work, we relax the non-repetitive assumption and propose a novel estimator for the mutation rate. We derive theoretical bounds on our estimator’s bias. Our experiments show that it remains accurate for repetitive genomic sequences, such as the alpha satellite higher order repeats in centromeres. We demonstrate our estimator’s robustness across diverse datasets and various ranges of the substitution rate and k-mer size. Finally, we show how sketching can be used to avoid dealing with large k-mer sets while retaining accuracy. Our software is available at https://github.com/medvedevgroup/Repeat-Aware_Substitution_Rate_Estimator.

Cite as

Haonan Wu, Antonio Blanca, and Paul Medvedev. A k-mer-Based Estimator of the Substitution Rate Between Repetitive Sequences. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 20:1-20:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{wu_et_al:LIPIcs.WABI.2025.20,
  author =	{Wu, Haonan and Blanca, Antonio and Medvedev, Paul},
  title =	{{A k-mer-Based Estimator of the Substitution Rate Between Repetitive Sequences}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{20:1--20:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.20},
  URN =		{urn:nbn:de:0030-drops-239465},
  doi =		{10.4230/LIPIcs.WABI.2025.20},
  annote =	{Keywords: k-mers, sketching, mutation rates}
}
Document
Estimation of Substitution and Indel Rates via k-mer Statistics

Authors: Mahmudur Rahman Hera, Paul Medvedev, David Koslicki, and Antonio Blanca

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
Methods utilizing k-mers are widely used in bioinformatics, yet our understanding of their statistical properties under realistic mutation models remains incomplete. Previously, substitution-only mutation models have been considered to derive precise expectations and variances for mutated k-mers and intervals of mutated and non-mutated sequences. In this work, we consider a mutation model that incorporates insertions and deletions in addition to single-nucleotide substitutions. Within this framework, we derive closed-form k-mer-based estimators for the three fundamental mutation parameters: substitution, deletion rate, and insertion rates. We provide theoretical guarantees in the form of concentration inequalities, ensuring accuracy of our estimators under reasonable model assumptions. Empirical evaluations on simulated evolution of genomic sequences confirm our theoretical findings, demonstrating that accounting for insertions and deletions signals allows for accurate estimation of mutation rates and improves upon the results obtained by considering a substitution-only model. An implementation of estimating the mutation parameters from a pair of fasta files is available here: https://github.com/KoslickiLab/estimate_rates_using_mutation_model.git. The results presented in this manuscript can be reproduced using the code available here: https://github.com/KoslickiLab/est_rates_experiments.git.

Cite as

Mahmudur Rahman Hera, Paul Medvedev, David Koslicki, and Antonio Blanca. Estimation of Substitution and Indel Rates via k-mer Statistics. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 16:1-16:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{rahmanhera_et_al:LIPIcs.WABI.2025.16,
  author =	{Rahman Hera, Mahmudur and Medvedev, Paul and Koslicki, David and Blanca, Antonio},
  title =	{{Estimation of Substitution and Indel Rates via k-mer Statistics}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{16:1--16:15},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.16},
  URN =		{urn:nbn:de:0030-drops-239422},
  doi =		{10.4230/LIPIcs.WABI.2025.16},
  annote =	{Keywords: k-mers, mutation rate, indel, alignment-free, estimation, substitution, insertion, deletion}
}
Document
Optimal Subtree Prune and Regraft for Quartet Score in Sub-Quadratic Time

Authors: Shayesteh Arasti and Siavash Mirarab

Published in: LIPIcs, Volume 273, 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)


Abstract
Finding a tree with the minimum total distance to a given set of trees (the median tree) is increasingly needed in phylogenetics. Defining tree distance as the number of induced four-taxon unrooted (i.e., quartet) trees with different topologies, the median of a set of gene trees is a statistically consistent estimator of the species tree under several models of gene tree species tree discordance. Because of this, median trees defined with quartet distance are widely used in practice for species tree inference. Nevertheless, the problem is NP-Hard and the widely-used solutions are heuristics. In this paper, we pave the way for a new type of heuristic solution to this problem. We show that the optimal place to add a subtree of size m onto a tree with n leaves can be found in time that grows quasi-linearly with n and is nearly independent of m. This algorithm can be used to perform subtree prune and regraft (SPR) moves efficiently, which in turn enables the hill-climbing heuristic search for the optimal tree. In exploratory experiments, we show that our algorithm can improve the quartet score of trees obtained using the existing widely-used methods.

Cite as

Shayesteh Arasti and Siavash Mirarab. Optimal Subtree Prune and Regraft for Quartet Score in Sub-Quadratic Time. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 4:1-4:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)


Copy BibTex To Clipboard

@InProceedings{arasti_et_al:LIPIcs.WABI.2023.4,
  author =	{Arasti, Shayesteh and Mirarab, Siavash},
  title =	{{Optimal Subtree Prune and Regraft for Quartet Score in Sub-Quadratic Time}},
  booktitle =	{23rd International Workshop on Algorithms in Bioinformatics (WABI 2023)},
  pages =	{4:1--4:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-294-5},
  ISSN =	{1868-8969},
  year =	{2023},
  volume =	{273},
  editor =	{Belazzougui, Djamal and Ouangraoua, A\"{i}da},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2023.4},
  URN =		{urn:nbn:de:0030-drops-186309},
  doi =		{10.4230/LIPIcs.WABI.2023.4},
  annote =	{Keywords: Phylogenetics, Gene tree discordance, Quartet score, Quartet distance, Subtree prune and regraft, Tree search, ASTRAL}
}
  • Refine by Type
  • 3 Document/PDF
  • 2 Document/HTML

  • Refine by Publication Year
  • 2 2025
  • 1 2023

  • Refine by Author
  • 2 Blanca, Antonio
  • 2 Medvedev, Paul
  • 1 Arasti, Shayesteh
  • 1 Koslicki, David
  • 1 Mirarab, Siavash
  • Show More...

  • Refine by Series/Journal
  • 3 LIPIcs

  • Refine by Classification
  • 2 Applied computing → Bioinformatics
  • 2 Applied computing → Computational biology
  • 1 Mathematics of computing → Probabilistic inference problems
  • 1 Theory of computation → Theory and algorithms for application domains

  • Refine by Keyword
  • 2 k-mers
  • 1 ASTRAL
  • 1 Gene tree discordance
  • 1 Phylogenetics
  • 1 Quartet distance
  • Show More...

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail