Topological Data Analysis Reveals Principles of Chromosome Structure in Cellular Differentiation

Authors Natalie Sauerwald, Yihang Shen, Carl Kingsford

Thumbnail PDF


  • Filesize: 4.07 MB
  • 16 pages

Document Identifiers

Author Details

Natalie Sauerwald
  • Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, 15213, USA
Yihang Shen
  • Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, 15213, USA
Carl Kingsford
  • Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, 15213, USA


The authors would like to thank Alessandro Bertero and William S. Noble for useful information about their data, and Guillaume Marçais for comments on the manuscript.

Cite AsGet BibTex

Natalie Sauerwald, Yihang Shen, and Carl Kingsford. Topological Data Analysis Reveals Principles of Chromosome Structure in Cellular Differentiation. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 23:1-23:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Topological data analysis (TDA) is a mathematically well-founded set of methods to derive robust information about the structure and topology of data. It has been applied successfully in several biological contexts. Derived primarily from algebraic topology, TDA rigorously identifies persistent features in complex data, making it well-suited to better understand the key features of three-dimensional chromosome structure. Chromosome structure has a significant influence in many diverse genomic processes and has recently been shown to relate to cellular differentiation. While there exist many methods to study specific substructures of chromosomes, we are still missing a global view of all geometric features of chromosomes. By applying TDA to the study of chromosome structure through differentiation across three cell lines, we provide insight into principles of chromosome folding and looping. We identify persistent connected components and one-dimensional topological features of chromosomes and characterize them across cell types and stages of differentiation. Availability: Scripts to reproduce the results from this study can be found at

Subject Classification

ACM Subject Classification
  • Applied computing → Computational biology
  • Applied computing
  • topological data analysis
  • chromosome structure
  • Hi-C
  • topologically associating domains


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Javier Arsuaga, Tyler Borrman, Raymond Cavalcante, Georgina Gonzalez, and Catherine Park. Identification of copy number aberrations in breast cancer subtypes using persistence topology. Microarrays, 4(3):339-369, 2015. Google Scholar
  2. Ferhat Ay, Timothy L Bailey, and William Stafford Noble. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Research, 24(6):999-1011, 2014. Google Scholar
  3. Pablo G Cámara. Topological methods for genomics: present and future directions. Current Opinion in Systems Biology, 1:95-101, 2017. Google Scholar
  4. Pablo G Camara, Daniel IS Rosenbloom, Kevin J Emmett, Arnold J Levine, and Raul Rabadan. Topological data analysis generates high-resolution, genome-wide maps of human recombination. Cell Systems, 3(1):83-94, 2016. Google Scholar
  5. Gunnar Carlsson. Topological pattern recognition for point cloud data. Acta Numerica, 23:289-368, 2014. Google Scholar
  6. Mathieu Carriere and Raul Rabadan. Topological Data Analysis of Single-cell Hi-C Contact Maps. arXiv, 2018. URL:
  7. Giacomo Cavalli and Tom Misteli. Functional implications of genome topology. Nature Structural and Molecular Biology, 20(3):290-299, 2013. Google Scholar
  8. Joseph Minhow Chan, Gunnar Carlsson, and Raul Rabadan. Topology of viral evolution. Proceedings of the National Academy of Sciences, pages 18566–-18571, 2013. Google Scholar
  9. Frédéric Chazal and Bertrand Michel. An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists. arXiv, 2017. URL:
  10. Yu Chen, Yang Zhang, Yuchuan Wang, Liguo Zhang, Eva K Brinkman, Stephen A Adam, Robert Goldman, Bas van Steensel, Jian Ma, and Andrew S Belmont. Mapping 3D genome organization relative to nuclear compartments using TSA-Seq as a cytological ruler. The Journal of Cell Biology, 217(11):4025-4048, 2018. Google Scholar
  11. Job Dekker, Karsten Rippe, Martijn Dekker, and Nancy Kleckner. Capturing chromosome conformation. Science, 295(5558):1306-1311, 2002. Google Scholar
  12. Jesse R Dixon, Inkyung Jung, Siddarth Selvaraj, Yin Shen, Jessica E Antosiewicz-Bourget, Ah Young Lee, Zhen Ye, Audrey Kim, Nisha Rajagopal, Wei Xie, et al. Chromatin architecture reorganization during stem cell differentiation. Nature, 518(7539):331-336, 2015. Google Scholar
  13. Geet Duggal, Rob Patro, Emre Sefer, Hao Wang, Darya Filippova, Samir Khuller, and Carl Kingsford. Resolving spatial inconsistencies in chromosome conformation measurements. Algorithms for Molecular Biology, 8:8, 2013. Google Scholar
  14. Geet Duggal, Hao Wang, and Carl Kingsford. Higher-order chromatin domains link eQTLs with the expression of far-away genes. Nucleic Acids Research, 42(1):87-96, 2014. Google Scholar
  15. Kevin Emmett, Benjamin Schweinhart, and Raul Rabadan. Multiscale topology of chromatin folding. arXiv, 2015. URL:
  16. Paul A Fields, Vijay Ramani, Giancarlo Bonora, Galip Gurkan Yardimci, Alessandro Bertero, Hans Reinecke, Lil Pabon, William S Noble, Jay Shendure, and Charles Murry. Dynamic reorganization of nuclear architecture during human cardiogenesis. bioRxiv, page 222877, 2017. Google Scholar
  17. Mattia Forcato, Chiara Nicoletti, Koustav Pal, Carmen Maria Livi, Francesco Ferrari, and Silvio Bicciato. Comparison of computational methods for Hi-C data analysis. Nature Methods, 14(7):679-685, 2017. Google Scholar
  18. Timothy SC Hinks, Tom Brown, Laurie CK Lau, Hitasha Rupani, Clair Barber, Scott Elliott, Jon A Ward, Junya Ono, Shoichiro Ohta, Kenji Izuhara, et al. Multidimensional endotyping in patients with severe asthma reveals inflammatory heterogeneity in matrix metalloproteinases and chitinase 3-like protein 1. Journal of Allergy and Clinical Immunology, 138(1):61-75, 2016. Google Scholar
  19. Maxim Imakaev, Geoffrey Fudenberg, Rachel Patton McCord, Natalia Naumova, Anton Goloborodko, Bryan R Lajoie, Job Dekker, and Leonid A Mirny. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature Methods, 9(10):999-1003, 2012. Google Scholar
  20. Li Li, Wei-Yi Cheng, Benjamin S Glicksberg, Omri Gottesman, Ronald Tamler, Rong Chen, Erwin P Bottinger, and Joel T Dudley. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Science Translational Medicine, 7(311):311ra174, 2015. Google Scholar
  21. Erez Lieberman-Aiden, Nynke L van Berkum, Louise Williams, Maxim Imakaev, Tobias Ragoczy, Agnes Telling, Ido Amit, Bryan R Lajoie, Peter J Sabo, Michael O Dorschner, Richard Sandstrom, Bradley Bernstein, M A Bender, Mark Groudine, Andreas Gnirke, John Stamatoyannopoulos, Leonid A Mirny, Eric S Lander, and Job Dekker. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950):289-293, 2009. Google Scholar
  22. Clément Maria, Jean-Daniel Boissonnat, Marc Glisse, and Mariette Yvinec. The Gudhi library: Simplicial complexes and persistent homology. In International Congress on Mathematical Software, pages 167-174. Springer, 2014. Google Scholar
  23. Anindya Moitra, Nicholas O Malott, and Philip A Wilsey. Cluster-based Data Reduction for Persistent Homology. In 2018 IEEE International Conference on Big Data (Big Data), pages 327-334. IEEE, 2018. Google Scholar
  24. Monica Nicolau, Arnold J Levine, and Gunnar Carlsson. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences, pages 7265-–7270, 2011. Google Scholar
  25. Benjamin D Pope, Tyrone Ryba, Vishnu Dileep, Feng Yue, Weisheng Wu, Olgert Denas, Daniel L Vera, Yanli Wang, R Scott Hansen, Theresa K Canfield, et al. Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527):402-405, 2014. Google Scholar
  26. Suhas S P Rao, Miriam H Huntley, Neva C Durand, Elena K Stamenova, Ivan D Bochkov, James T Robinson, Adrian L Sanborn, Ido Machol, Arina D Omer, Eric S Lander, and Erez Lieberman Aiden. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159(7):1665-1880, December 2014. Google Scholar
  27. Sarah Rennie, Maria Dalby, Lucas van Duin, and Robin Andersson. Transcriptional decomposition reveals active chromatin architectures and cell specific regulatory interactions. Nature Communications, 9(1):487, 2018. Google Scholar
  28. Abbas H Rizvi, Pablo G Camara, Elena K Kandror, Thomas J Roberts, Ira Schieren, Tom Maniatis, and Raul Rabadan. Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nature Biotechnology, 35(6):551, 2017. Google Scholar
  29. Nicolas Servant, Nelle Varoquaux, Bryan R Lajoie, Eric Viara, Chong-Jian Chen, Jean-Philippe Vert, Edith Heard, Job Dekker, and Emmanuel Barillot. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biology, 16(1):259, 2015. Google Scholar
  30. Edwin H Spanier. Algebraic topology. MacGraw-Hill, New York, 1966. Google Scholar
  31. Larry Wasserman. Topological data analysis. Annual Review of Statistics and Its Application, 5:501-532, 2018. Google Scholar
  32. Y William Yu, Noah M Daniels, David Christian Danko, and Bonnie Berger. Entropy-scaling search of massive biological data. Cell Systems, 1(2):130-140, 2015. Google Scholar