ThIEF: Finding Genome-wide Trajectories of Epigenetics Marks

Authors Anton Polishko, Md. Abid Hasan, Weihua Pan, Evelien M. Bunnik, Karine Le Roch, Stefano Lonardi



PDF
Thumbnail PDF

File

LIPIcs.WABI.2017.19.pdf
  • Filesize: 2.92 MB
  • 16 pages

Document Identifiers

Author Details

Anton Polishko
Md. Abid Hasan
Weihua Pan
Evelien M. Bunnik
Karine Le Roch
Stefano Lonardi

Cite AsGet BibTex

Anton Polishko, Md. Abid Hasan, Weihua Pan, Evelien M. Bunnik, Karine Le Roch, and Stefano Lonardi. ThIEF: Finding Genome-wide Trajectories of Epigenetics Marks. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 19:1-19:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.WABI.2017.19

Abstract

We address the problem of comparing multiple genome-wide maps representing nucleosome positions or specific histone marks. These maps can originate from the comparative analysis of ChIP-Seq/MNase-Seq/FAIRE-Seq data for different cell types/tissues or multiple time points. The input to the problem is a set of maps, each of which is a list of genomics locations for nucleosomes or histone marks. The output is an alignment of nucleosomes/histone marks across time points (that we call trajectories), allowing small movements and gaps in some of the maps. We present a tool called ThIEF (TrackIng of Epigenetic Features) that can efficiently compute these trajectories. ThIEF comes into two "flavors": ThIEF:Iterative finds the trajectories progressively using bipartite matching, while ThIEF:LP solves a k-partite matching problem on a hyper graph using linear programming. ThIEF:LP is guaranteed to find the optimal solution, but it is slower than ThIEF:Iterative. We demonstrate the utility of ThIEF by providing an example of applications on the analysis of temporal nucleosome maps for the human malaria parasite. As a surprisingly remarkable result, we show that the output of ThIEF can be used to produce a supervised classifier that can accurately predict the position of stable nucleosomes (i.e., nucleosomes present in all time points) and unstable nucleosomes (i.e., present in at most half of the time points) from the primary DNA sequence. To the best of our knowledge, this is the first result on the prediction of the dynamics of nucleosomes solely based on their DNA binding preference. Software is available at https://github.com/ucrbioinfo/ThIEF.
Keywords
  • Nucleosomes
  • Histone Marks
  • Histone Tail Modifications
  • Epigenetics
  • Genomics

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Schahram Akbarian, Chunyu Liu, et al. The PsychENCODE project. Nature Publishing Group, 2015. Google Scholar
  2. Bradley E. Bernstein, John A. Stamatoyannopoulos, Joseph F. Costello, Bing Ren, Aleksandar Milosavljevic, Alexander Meissner, Manolis Kellis, Marco A. Marra, Arthur L. Beaudet, Joseph R. Ecker, Peggy J. Farnham, Martin Hirst, Eric S. Lander, Tarjei S. Mikkelsen, and James A. Thomson. The NIH roadmap epigenomics mapping consortium. Nat Biotech, 28(10):1045-1048, 10 2010. Google Scholar
  3. Daniel Blankenberg, Gregory Von Kuster, Nathaniel Coraor, Guruprasad Ananda, Ross Lazarus, Mary Mangan, Anton Nekrutenko, and James Taylor. Galaxy: A web-based genome analysis tool for experimentalists. Current protocols in molecular biology, pages 19-10, 2010. Google Scholar
  4. Yuk Hei Chan and Lap Chi Lau. On linear and semidefinite programming relaxations for hypergraph matching. Mathematical programming, 135(1-2):123-148, 2012. Google Scholar
  5. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1-27:27, May 2011. Google Scholar
  6. Robert C. Edgar. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research, 32(5):1792-7, January 2004. Google Scholar
  7. Isaac Elias. Settling the intractability of multiple alignment. Journal of Computational Biology, 13(7):1323-1339, 2006. Google Scholar
  8. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, 2012. Google Scholar
  9. Jason Ernst and Manolis Kellis. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods, 2012. Google Scholar
  10. Mark B. Gerstein, Zhi John Lu, et al. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science, 2010. Google Scholar
  11. Belinda Giardine, Cathy Riemer, Ross C. Hardison, Richard Burhans, Laura Elnitski, Prachi Shah, Yi Zhang, Daniel Blankenberg, Istvan Albert, James Taylor, Webb C. Miller, W. James Kent, and Anton Nekrutenko. Galaxy: a platform for interactive large-scale genome analysis. Genome research, 15(10):1451-1455, 2005. Google Scholar
  12. Jeremy Goecks, Anton Nekrutenko, James Taylor, and The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 11(8):R86, 2010. Google Scholar
  13. Desmond Higgins and Paul Sharp. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene, 73(1):237-244, 1988. Google Scholar
  14. Michael M. Hoffman, Orion J. Buske, Jie Wang, Zhiping Weng, Jeff A. Bilmes, and William Stafford Noble. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature Methods, 9:473-476, 2012. Google Scholar
  15. Michael M. Hoffman, Jason Ernst, Steven P. Wilder, Anshul Kundaje, Robert S. Harris, Max Libbrecht, Belinda Giardine, Paul M. Ellenbogen, Jeffrey A. Bilmes, Ewan Birney, Ross C. Hardison, Ian Dunham, Manolis Kellis, and William Stafford Noble. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Research, 41(2):827-841, 2013. Google Scholar
  16. Masato Ishikawa, Tomoyuki Toya, Masaki Hoshida, Katsumi Nitta, Atushi Ogiwara, and Minoru Kanehisa. Multiple sequence alignment by parallel simulated annealing. Comput. Appl. Biosci., 9(3):267-273, 1993. Google Scholar
  17. Roy Jonker and Ton Volgenant. Improving the Hungarian assignment algorithm. Operations Research Letters, 5(4):171-175, 1986. Google Scholar
  18. W. Just. Computational complexity of multiple sequence alignment with SP-score. Journal of Computational Biology, 8(6):615-623, 2001. Google Scholar
  19. Philip Reiner Kensche, Wieteke Anna Maria Hoeijmakers, Christa Geeke Toenhake, Maaike Bras, Lia Chappell, Matthew Berriman, and Richárd Bártfai. The nucleosome landscape of plasmodium falciparum reveals chromatin architecture and dynamics of regulatory sequences. Nucleic Acids Research, 44(5):2110-2124, 2016. Google Scholar
  20. W. James Kent, Robert Baertsch, Angie Hinrichs, Webb Miller, and David Haussler. Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. U. S. A., 100(20):11484-11489, 30 September 2003. Google Scholar
  21. Jin Kim, Sakti Pramanik, and Moon Chung. Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci., 10(4):419-426, 1994. Google Scholar
  22. R. D. Kornberg and L. Stryer. Statistical distributions of nucleosomes: nonrandom locations by a stochastic mechanism. Nucleic Acids Research, 16(14A):6677-6690, 07 1988. Google Scholar
  23. Ekaterina Kotelnikova, Vsevolod Makeev, and Mikhail Gelfand. Evolution of transcription factor DNA binding sites. Gene, 347(2):255-263, 2005. Google Scholar
  24. Ben Langmead and Steven L. Salzberg. Fast gapped-read alignment with bowtie 2. Nat Meth, 9(4):357-359, 04 2012. Google Scholar
  25. M. A. Larkin, G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan, H. McWilliam, F. Valentin, I. M. Wallace, A. Wilm, R. Lopez, J. D. Thompson, T. J. Gibson, and D. G. Higgins. Clustal W and Clustal X version 2.0. Bioinformatics, 23(21):2947-2948, 2007. Google Scholar
  26. Elisa Leimgruber, Queralt Seguin-Estevez, Isabelle Dunand-Sauthier, Natalia Rybtsova, Christoph D. Schmid, Giovanna Ambrosini, Philipp Bucher, and Walter Reith. Nucleosome eviction from MHC class II promoters controls positioning of the transcription start site. Nucleic Acids Res, 37(8):2514-2528, May 2009. Google Scholar
  27. Hongde Liu, Xueye Duan, Shuangxin Yu, and Xiao Sun. Analysis of nucleosome positioning determined by DNA helix curvature in the human genome. BMC Genomics, 12:72, Jan 2011. Google Scholar
  28. Saulius Lukauskas, Roberto Visintainer, Guido Sanguinetti, and Gabriele B. Schweikert. DGW: an exploratory data analysis tool for clustering and visualisation of epigenomic marks. BMC Bioinformatics, 17(16):447, 2016. Google Scholar
  29. Andrew Makhorin. GLPK (GNU linear programming kit), 2008. Google Scholar
  30. Alessandro Mammana and Ho-Ryun Chung. Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome. Genome Biology, 16(1):151, 2015. Google Scholar
  31. Alan Moses, Derek Chiang, Daniel Pollard, Venky Iyer, and Michael Eisen. MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome biology, 5(12):R98, 2004. Google Scholar
  32. S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol, 48(3):443-453, Mar 1970. Google Scholar
  33. C. Notredame and D. G. Higgins. SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Res., 24(8):1515-1524, 1996. Google Scholar
  34. C. Notredame, E. A. O'Brien, and D. G. Higgins. RAGA: RNA sequence alignment by genetic algorithm. Nucleic acids research, 25(22):4570-4580, 1997. Google Scholar
  35. Heather E. Peckham, Robert E. Thurman, Yutao Fu, John A. Stamatoyannopoulos, William Stafford Noble, Kevin Struhl, and Zhiping Weng. Nucleosome positioning signals in genomic DNA. Genome Res, 17(8):1170-1177, Aug 2007. Google Scholar
  36. A. Polishko, E. M. Bunnik, K. G. Le Roch, and S. Lonardi. PuFFIN: A parameter-free method to build nucleosome maps from paired-end reads. BMC Bioinformatics, 15(Suppl 9):S11, 2014. Google Scholar
  37. Anton Polishko, Nadia Ponts, Karine G Le Roch, and Stefano Lonardi. NORMAL: accurate nucleosome positioning using a modified gaussian mixture model. Bioinformatics (Oxford, England), 28(12):i242-9, June 2012. Google Scholar
  38. Ekaterina Protozanova, Peter Yakovchuk, and Maxim D. Frank-Kamenetskii. Stacked-unstacked equilibrium at the nick site of DNA. J Mol Biol, 342(3):775-785, Sep 2004. Google Scholar
  39. Rainer Pudimat, Ernst-Günter Schukat-Talamazzini, and Rolf Backofen. A multiple-feature framework for modelling and predicting transcription factor binding sites. Bioinformatics, 21(14):3082-3088, 2005. Google Scholar
  40. Aaron R. Quinlan and Ira M. Hall. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6):841-842, 2010. Google Scholar
  41. S. Roy, J. Ernst, P. V. Kharchenko, and P. Kheradpour. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science, 2010. Google Scholar
  42. Rafik A. Salama and Dov J. Stekel. A non-independent energy-based multiple sequence alignment improves prediction of transcription factor binding sites. Bioinformatics, 29(21):2699-2704, 2013. Google Scholar
  43. Eran Segal, Yvonne Fondufe-Mittendorf, Lingyi Chen, AnnChristine Thastrom, Yair Field, Irene K. Moore, Ji-Ping Z. Wang, and Jonathan Widom. A genomic code for nucleosome positioning. Nature, 442(7104):772-778, 08 2006. Google Scholar
  44. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. J Mol Biol, 147(1):195-197, Mar 1981. Google Scholar
  45. Julie Thompson, Toby Gibson, and Des Higgins. Multiple sequence alignment using ClustalW and ClustalX. Current protocols in bioinformatics, Chapter 2, 2002. Google Scholar
  46. Julie Thompson, Desmond Higgins, and Toby Gibson. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22(22):4673-4680, 1994. Google Scholar
  47. Mari-Liis Visnapuu and Eric C Greene. Single-molecule imaging of DNA curtains reveals intrinsic energy landscapes for nucleosome deposition. Nat Struct Mol Biol, 16(10):1056-1062, 10 2009. Google Scholar
  48. Jia Wang, Shuai Liu, and Weina Fu. Nucleosome positioning with set of key positions and nucleosome affinity. Open Biomed Eng J, 8:166-170, 2014. Google Scholar
  49. L. Wang and T. Jiang. On the complexity of multiple sequence alignment. Journal of Computational Biology, 1(4):337-348, 1994. Google Scholar
  50. Yong Zhang, Tao Liu, Clifford A. Meyer, Jérôme Eeckhoute, David S. Johnson, Bradley E. Bernstein, Chad Nusbaum, Richard M. Myers, Myles Brown, Wei Li, and X. Shirley Liu. Model-based analysis of ChIP-Seq (MACS). Genome Biology, 9(9):R137, 2008. Google Scholar
  51. Yong Zhang, Zarmik Moqtaderi, Barbara P. Rattner, Ghia Euskirchen, Michael Snyder, James T. Kadonaga, X. Shirley Liu, and Kevin Struhl. Intrinsic histone-DNA interactions are not the major determinant of nucleosome positions in vivo. Nat Struct Mol Biol, 16(8):847-852, Aug 2009. Google Scholar
  52. Xiujuan Zhao, Zhiyong Pei, Jia Liu, Sheng Qin, and Lu Cai. Prediction of nucleosome DNA formation potential and nucleosome positioning using increment of diversity combined with quadratic discriminant analysis. Chromosome Res, 18(7):777-785, Nov 2010. Google Scholar