OASIcs.GCB.2013.101.pdf
- Filesize: 0.49 MB
- 9 pages
The transcription of genes is often regulated not only by transcription factors binding at single sites per promoter, but by the interplay of multiple copies of one or more transcription factors binding at multiple sites forming a cis-regulatory module. The computational recognition of cis-regulatory modules from ChIP-seq or other high-throughput data is crucial in modern life and medical sciences. A common type of cis-regulatory modules are homotypic clusters of binding sites, i.e., clusters of binding sites of one transcription factor. For their recognition the homotypic Sunflower Hidden Markov Model is a promising statistical model. However, this model neglects statistical dependences among nucleotides within binding sites and flanking regions, which makes it not well suited for de-novo motif discovery. Here, we propose an extension of this model that allows statistical dependences within binding sites, their reverse complements, and flanking regions. We study the efficacy of this extended homotypic Sunflower Hidden Markov Model based on ChIP-seq data from the Human ENCODE Project and find that it often outperforms the traditional homotypic Sunflower Hidden Markov Model.
Feedback for Dagstuhl Publishing