2 Search Results for "Horwege, Sebastian"


Document
A k-mer-Based Estimator of the Substitution Rate Between Repetitive Sequences

Authors: Haonan Wu, Antonio Blanca, and Paul Medvedev

Published in: LIPIcs, Volume 344, 25th International Conference on Algorithms for Bioinformatics (WABI 2025)


Abstract
K-mer-based analysis of genomic data is ubiquitous, but the presence of repetitive k-mers continues to pose problems for the accuracy of many methods. For example, the Mash tool (Ondov et al. 2016) can accurately estimate the substitution rate between two low-repetitive sequences from their k-mer sketches; however, it is inaccurate on repetitive sequences such as the centromere of a human chromosome. Follow-up work by Blanca et al. (2021) has attempted to model how mutations affect k-mer sets based on strong assumptions that the sequence is non-repetitive and that mutations do not create spurious k-mer matches. However, the theoretical foundations for extending an estimator like Mash to work in the presence of repeat sequences have been lacking. In this work, we relax the non-repetitive assumption and propose a novel estimator for the mutation rate. We derive theoretical bounds on our estimator’s bias. Our experiments show that it remains accurate for repetitive genomic sequences, such as the alpha satellite higher order repeats in centromeres. We demonstrate our estimator’s robustness across diverse datasets and various ranges of the substitution rate and k-mer size. Finally, we show how sketching can be used to avoid dealing with large k-mer sets while retaining accuracy. Our software is available at https://github.com/medvedevgroup/Repeat-Aware_Substitution_Rate_Estimator.

Cite as

Haonan Wu, Antonio Blanca, and Paul Medvedev. A k-mer-Based Estimator of the Substitution Rate Between Repetitive Sequences. In 25th International Conference on Algorithms for Bioinformatics (WABI 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 344, pp. 20:1-20:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025)


Copy BibTex To Clipboard

@InProceedings{wu_et_al:LIPIcs.WABI.2025.20,
  author =	{Wu, Haonan and Blanca, Antonio and Medvedev, Paul},
  title =	{{A k-mer-Based Estimator of the Substitution Rate Between Repetitive Sequences}},
  booktitle =	{25th International Conference on Algorithms for Bioinformatics (WABI 2025)},
  pages =	{20:1--20:20},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-386-7},
  ISSN =	{1868-8969},
  year =	{2025},
  volume =	{344},
  editor =	{Brejov\'{a}, Bro\v{n}a and Patro, Rob},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2025.20},
  URN =		{urn:nbn:de:0030-drops-239465},
  doi =		{10.4230/LIPIcs.WABI.2025.20},
  annote =	{Keywords: k-mers, sketching, mutation rates}
}
Document
Alignment-free sequence comparison with spaced k-mers

Authors: Marcus Boden, Martin Schöneich, Sebastian Horwege, Sebastian Lindner, Chris Leimeister, and Burkhard Morgenstern

Published in: OASIcs, Volume 34, German Conference on Bioinformatics 2013


Abstract
Alignment-free methods are increasingly used for genome analysis and phylogeny reconstruction since they circumvent various difficulties of traditional approaches that rely on multiple sequence alignments. In particular, they are much faster than alignment-based methods. Most alignment-free approaches work by analyzing the k-mer composition of sequences. In this paper, we propose to use 'spaced k-mers', i.e. patterns of deterministic and 'don't care' positions instead of contiguous k-mers. Using simulated and real-world sequence data, we demonstrate that this approach produces better phylogenetic trees than alignment-free methods that rely on contiguous k-mers. In addition, distances calculated with spaced k-mers appear to be statistically more stable than distances based on contiguous k-mers.

Cite as

Marcus Boden, Martin Schöneich, Sebastian Horwege, Sebastian Lindner, Chris Leimeister, and Burkhard Morgenstern. Alignment-free sequence comparison with spaced k-mers. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 24-34, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)


Copy BibTex To Clipboard

@InProceedings{boden_et_al:OASIcs.GCB.2013.24,
  author =	{Boden, Marcus and Sch\"{o}neich, Martin and Horwege, Sebastian and Lindner, Sebastian and Leimeister, Chris and Morgenstern, Burkhard},
  title =	{{Alignment-free sequence comparison with spaced k-mers}},
  booktitle =	{German Conference on Bioinformatics 2013},
  pages =	{24--34},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-59-0},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{34},
  editor =	{Bei{\ss}barth, Tim and Kollmar, Martin and Leha, Andreas and Morgenstern, Burkhard and Schultz, Anne-Kathrin and Waack, Stephan and Wingender, Edgar},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/OASIcs.GCB.2013.24},
  URN =		{urn:nbn:de:0030-drops-42334},
  doi =		{10.4230/OASIcs.GCB.2013.24},
  annote =	{Keywords: Alignment-free sequence comparison, phylogeny reconstruction}
}
  • Refine by Type
  • 2 Document/PDF
  • 1 Document/HTML

  • Refine by Publication Year
  • 1 2025
  • 1 2013

  • Refine by Author
  • 1 Blanca, Antonio
  • 1 Boden, Marcus
  • 1 Horwege, Sebastian
  • 1 Leimeister, Chris
  • 1 Lindner, Sebastian
  • Show More...

  • Refine by Series/Journal
  • 1 LIPIcs
  • 1 OASIcs

  • Refine by Classification
  • 1 Applied computing → Bioinformatics
  • 1 Applied computing → Computational biology

  • Refine by Keyword
  • 1 Alignment-free sequence comparison
  • 1 k-mers
  • 1 mutation rates
  • 1 phylogeny reconstruction
  • 1 sketching

Any Issues?
X

Feedback on the Current Page

CAPTCHA

Thanks for your feedback!

Feedback submitted to Dagstuhl Publishing

Could not send message

Please try again later or send an E-mail