2 Search Results for "Waterman, Michael S."


Document
Invited Talk
Efficient Solutions to Biological Problems Using de Bruijn Graphs (Invited Talk)

Authors: Leena Salmela

Published in: LIPIcs, Volume 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)


Abstract
The de Bruijn graph has become a standard method in the analysis of sequencing reads in computational biology due to its ability to represent the information contained in large read sets in small space. A de Bruijn graph represents a set of sequencing reads by its k-mers, i.e. the set of substrings of length k that occur in the reads. In the classical definition, the k-mers are the edges of the graph and the nodes are the k-1 bases long prefixes and suffixes of the k-mers. Usually only k-mers occurring several times in the read set are kept to filter out noise in the data. De Bruijn graphs have been used to solve many problems in computational biology including genome assembly [Ramana M. Idury and Michael S. Waterman, 1995; Pavel A. Pevzner et al., 2001; Anton Bankevich et al., 2012; Yu Peng et al., 2010], sequencing error correction [Leena Salmela and Eric Rivals, 2014; Giles Miclotte et al., 2016; Leena Salmela et al., 2017; Limasset et al., 2019], reference free variant calling [Raluca Uricaru et al., 2015], indexing read sets [Camille Marchet et al., 2021], and so on. Next I will discuss two of these problems in more depth. The de Bruijn graph first emerged in computation biology in the context of genome assembly [Ramana M. Idury and Michael S. Waterman, 1995; Pavel A. Pevzner et al., 2001] where the task is to reconstruct a genome based on sequencing reads. As the de Bruijn graph can represent large read sets compactly, it became the standard approach to assemble short reads [Anton Bankevich et al., 2012; Yu Peng et al., 2010]. In the theoretical framework of de Bruijn graph based genome assembly, a genome is thought to be the Eulerian path in the de Bruijn graph built on the sequencing reads. In practise, the Eulerian path is not unique and thus not useful in the biological context. Therefore, practical implementations report subpaths that are guaranteed to be part of any Eulerian path and thus part of the actual genome. Such models include unitigs, which are nonbranching paths of the de Bruijn graph, and more involved definitions such as omnitigs [Alexandru I. Tomescu and Paul Medvedev, 2017]. In genome assembly the choice of k is a crucial matter. A small k can result in a tangled graph, whereas a too large k will fragment the graph. Furthermore, a different value of k may be optimal for different parts of the genome. Variable order de Bruijn graphs [Christina Boucher et al., 2015; Djamal Belazzougui et al., 2016], which represent de Bruijn graphs of all orders k in a single data structure, have been proposed as a solution but no rigorous definition corresponding to unitigs has been presented. We give the first definition of assembled sequences, i.e. contigs, on such graphs and an algorithm for enumerating them. Another problem that can be solved with de Bruijn graphs is the correction of sequencing errors [Leena Salmela and Eric Rivals, 2014; Giles Miclotte et al., 2016; Leena Salmela et al., 2017; Limasset et al., 2019]. Because each position of a genome is sequenced several times, it is possible to correct sequencing errors in reads if we can identify data originating from the same genomic region. A de Bruijn graph can be used to represent compactly the reliable information and the individual reads can be corrected by aligning them to the graph.

Cite as

Leena Salmela. Efficient Solutions to Biological Problems Using de Bruijn Graphs (Invited Talk). In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 1:1-1:2, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)


Copy BibTex To Clipboard

@InProceedings{salmela:LIPIcs.WABI.2022.1,
  author =	{Salmela, Leena},
  title =	{{Efficient Solutions to Biological Problems Using de Bruijn Graphs}},
  booktitle =	{22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)},
  pages =	{1:1--1:2},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-243-3},
  ISSN =	{1868-8969},
  year =	{2022},
  volume =	{242},
  editor =	{Boucher, Christina and Rahmann, Sven},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2022.1},
  URN =		{urn:nbn:de:0030-drops-170357},
  doi =		{10.4230/LIPIcs.WABI.2022.1},
  annote =	{Keywords: de Bruijn graph, variable order de Bruijn graph, genome assembly, sequencing error correction, k-mers}
}
Document
Molecular Bioinformatics (Dagstuhl Seminar 9237)

Authors: Thomas Lengauer, Dietmar Schomburg, and Michael S. Waterman

Published in: Dagstuhl Seminar Reports. Dagstuhl Seminar Reports, Volume 1 (2021)


Abstract

Cite as

Thomas Lengauer, Dietmar Schomburg, and Michael S. Waterman. Molecular Bioinformatics (Dagstuhl Seminar 9237). Dagstuhl Seminar Report 46, pp. 1-28, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (1992)


Copy BibTex To Clipboard

@TechReport{lengauer_et_al:DagSemRep.46,
  author =	{Lengauer, Thomas and Schomburg, Dietmar and Waterman, Michael S.},
  title =	{{Molecular Bioinformatics (Dagstuhl Seminar 9237)}},
  pages =	{1--28},
  ISSN =	{1619-0203},
  year =	{1992},
  type = 	{Dagstuhl Seminar Report},
  number =	{46},
  institution =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/DagSemRep.46},
  URN =		{urn:nbn:de:0030-drops-149349},
  doi =		{10.4230/DagSemRep.46},
}
  • Refine by Author
  • 1 Lengauer, Thomas
  • 1 Salmela, Leena
  • 1 Schomburg, Dietmar
  • 1 Waterman, Michael S.

  • Refine by Classification
  • 1 Applied computing → Sequencing and genotyping technologies
  • 1 Theory of computation → Design and analysis of algorithms

  • Refine by Keyword
  • 1 de Bruijn graph
  • 1 genome assembly
  • 1 k-mers
  • 1 sequencing error correction
  • 1 variable order de Bruijn graph

  • Refine by Type
  • 2 document

  • Refine by Publication Year
  • 1 1992
  • 1 2022

Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail