Search Results

Documents authored by Toivonen, Jarkko


Document
Seed-driven Learning of Position Probability Matrices from Large Sequence Sets

Authors: Jarkko Toivonen, Jussi Taipale, and Esko Ukkonen

Published in: LIPIcs, Volume 88, 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)


Abstract
We formulate and analyze a novel seed-driven algorithm SeedHam for PPM learning. To learn a PPM of length l, the algorithm uses the most frequent l-mer of the training data as a seed, and then restricts the learning into a small Hamming neighbourhood of the seed. The SeedHam method is intended for PPM learning from large sequence sets (up to hundreds of Mbases) containing enriched motif instances. A robust variant of the method is introduced that decreases contamination from artefact instances of the motif and thereby allows using larger Hamming neighbourhoods. To solve the motif orientation problem in two-stranded DNA we introduce a novel seed finding rule, based on analysis of the palindromic structure of sequences. Test experiments are reported, that illustrate the relative strengths of different variants of our methods, and show that our algorithms are fast and give stable and accurate results. Availability and implementation: A C++ implementation of the method is available from https://github.com/jttoivon/seedham/ Contact: jarkko.toivonen@cs.helsinki.fi

Cite as

Jarkko Toivonen, Jussi Taipale, and Esko Ukkonen. Seed-driven Learning of Position Probability Matrices from Large Sequence Sets. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 25:1-25:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Copy BibTex To Clipboard

@InProceedings{toivonen_et_al:LIPIcs.WABI.2017.25,
  author =	{Toivonen, Jarkko and Taipale, Jussi and Ukkonen, Esko},
  title =	{{Seed-driven Learning of Position Probability Matrices from Large Sequence Sets}},
  booktitle =	{17th International Workshop on Algorithms in Bioinformatics (WABI 2017)},
  pages =	{25:1--25:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-050-7},
  ISSN =	{1868-8969},
  year =	{2017},
  volume =	{88},
  editor =	{Schwartz, Russell and Reinert, Knut},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2017.25},
  URN =		{urn:nbn:de:0030-drops-76470},
  doi =		{10.4230/LIPIcs.WABI.2017.25},
  annote =	{Keywords: motif finding, transcription factor binding site, sequence analysis, Hamming distance, seed}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail