A Graph-Theoretic Barcode Ordering Model for Linked-Reads

Authors Yoann Dufresne , Chen Sun, Pierre Marijon , Dominique Lavenier , Cedric Chauve , Rayan Chikhi



PDF
Thumbnail PDF

File

LIPIcs.WABI.2020.11.pdf
  • Filesize: 1.77 MB
  • 17 pages

Document Identifiers

Author Details

Yoann Dufresne
  • Department of Computational Biology, C3BI USR 3756 CNRS, Institut Pasteur, Paris, France
Chen Sun
  • Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA
Pierre Marijon
  • Center for Bioinformatics, Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Dominique Lavenier
  • IRISA, Inria, Université de Rennes, France
Cedric Chauve
  • Department of Mathematics, Simon Fraser University, Burnaby, Canada
  • LaBRI, Université de Bordeaux, France
Rayan Chikhi
  • Department of Computational Biology, C3BI USR 3756 CNRS, Institut Pasteur, Paris, France

Acknowledgements

The authors are grateful to Paul Medvedev and Jean-Stéphane Varré for preliminary work and discussions and to Marthe Bonamy for discussions about interval graphs models.

Cite AsGet BibTex

Yoann Dufresne, Chen Sun, Pierre Marijon, Dominique Lavenier, Cedric Chauve, and Rayan Chikhi. A Graph-Theoretic Barcode Ordering Model for Linked-Reads. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 11:1-11:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.WABI.2020.11

Abstract

Considering a set of intervals on the real line, an interval graph records these intervals as nodes and their intersections as edges. Identifying (i.e. merging) pairs of nodes in an interval graph results in a multiple-interval graph. Given only the nodes and the edges of the multiple-interval graph without knowing the underlying intervals, we are interested in the following questions. Can one determine how many intervals correspond to each node? Can one compute a walk over the multiple-interval graph nodes that reflects the ordering of the original intervals? These questions are closely related to linked-read DNA sequencing, where barcodes are assigned to long molecules whose intersection graph forms an interval graph. Each barcode may correspond to multiple molecules, which complicates downstream analysis, and corresponds to the identification of nodes of the corresponding interval graph. Resolving the above graph-theoretic problems would facilitate analyses of linked-reads sequencing data, through enabling the conceptual separation of barcodes into molecules and providing, through the molecules order, a skeleton for accurately assembling the genome. Here, we propose a framework that takes as input an arbitrary intersection graph (such as an overlap graph of barcodes) and constructs a heuristic approximation of the ordering of the original intervals.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
Keywords
  • DNA sequencing
  • graph algorithms
  • linked-reads
  • interval graphs
  • cliques

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A Gurevich, Mikhail Dvorkin, Alexander S Kulikov, Valery M Lesin, Sergey I Nikolenko, Son Pham, Andrey D Prjibelski, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19(5):455-477, 2012. URL: https://doi.org/10.1089/cmb.2012.0021.
  2. Reuven Bar-Yehuda, Magnús M. Halldórsson, Joseph Naor, Hadas Shachnai, and Irina Shapira. Scheduling split intervals. SIAM J. Comput., 36(1):1-15, 2006. URL: https://doi.org/10.1137/S0097539703437843.
  3. Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. Gephi: An open source software for exploring and manipulating networks. In Proceedings of the Third International Conference on Weblogs and Social Media, ICWSM 2009, San Jose, California, USA, May 17-20, 2009. The AAAI Press, 2009. URL: http://aaai.org/ocs/index.php/ICWSM/09/paper/view/154.
  4. Alex Bishara, Eli L Moss, Mikhail Kolmogorov, Alma E Parada, Ziming Weng, Arend Sidow, Anne E Dekas, Serafim Batzoglou, and Ami S Bhatt. High-quality genome sequences of uncultured microbes by assembly of read clouds. Nat. Biotechnol., 36:1067-1075, 2018. URL: https://doi.org/10.1038/nbt.4266.
  5. Ivan Bliznets, Fedor V. Fomin, Marcin Pilipczuk, and Michal Pilipczuk. Subexponential parameterized algorithm for interval completion. ACM Trans. Algorithms, 14(3):35:1-35:62, 2018. URL: https://doi.org/10.1145/3186896.
  6. Kellogg S. Booth and George S. Lueker. Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms. J. Comput. Syst. Sci., 13(3):335-379, 1976. URL: https://doi.org/10.1016/S0022-0000(76)80045-1.
  7. Ayelet Butman, Danny Hermelin, Moshe Lewenstein, and Dror Rawitz. Optimization problems in multiple-interval graphs. ACM Trans. Algorithms, 6:268-277, 2007. URL: https://doi.org/10.1145/1721837.1721856.
  8. Zhoutao Chen, Long Pham, Tsai-Chin Wu, Guoya Mo, Yu Xia, Peter Chang, Devin Porter, Tan Phan, Huu Che, Hao Tran, Vikas Bansal, Justin Shaffer, Pedro Belda-Ferre, Greg Humphrey, Rob Knight, Pavel Pevzner, Son Pham, Yong Wang, and Ming Lei. Ultra-low input single tube linked-read library method enables short-read NGS systems to generate highly accurate and economical long-range sequencing information for de novo genome assembly and haplotype phasing. bioRxiv, page 852947, 2019. URL: https://doi.org/10.1101/852947.
  9. David Coudert. A note on integer linear programming formulations for linear ordering problems on graphs. Research Report hal-01271838, INRIA, I3S, Université Nice Sophia, 2016. URL: https://hal.inria.fr/hal-01271838.
  10. Christophe Crespelle, Paal Gronaas Drange, Fedor V. Fomin, and Petr A. Golovach. A survey of parameterized algorithms and the complexity of edge modification. ArXiv, abs/2001.06867, 2020. URL: https://arxiv.org/abs/2001.06867.
  11. David C Danko, Dmitry Meleshko, Daniela Bezdan, Christopher Mason, and Iman Hajirasouliha. Minerva: an alignment and reference free approach to deconvolve linked-reads for metagenomics. Genome Res., 29:116-124, 2019. URL: https://doi.org/10.1101/gr.235499.118.
  12. Michael R Fellows, Danny Hermelin, Frances A Rosamond, and Stéphane Vialette. On the parameterized complexity of multiple-interval graph problems. Theor. Comput. Sci., 410(1):53-61, 2009. Google Scholar
  13. Mathew C. Francis, Daniel Gonçalves, and Pascal Ochem. The maximum clique problem in multiple interval graphs. Algorithmica, 71(4):812-836, 2015. URL: https://doi.org/10.1007/s00453-013-9828-6.
  14. Zvi Galil. Efficient algorithms for finding maximum matching in graphs. ACM Comput. Surv., 18(1):23-38, 1986. URL: https://doi.org/10.1145/6462.6502.
  15. Martin Charles Golumbic. Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57). North-Holland Publishing Co., 2004. Google Scholar
  16. Stephanie U Greer, Lincoln D Nadauld, Billy T Lau, Jiamin Chen, Christina Wood-Bouwens, James M Ford, Calvin J Kuo, and Hanlee P Ji. Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases. Genome Med., 9(1):57, 2017. URL: https://doi.org/10.1186/s13073-017-0447-8.
  17. Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using NetworkX. In Gaël Varoquaux, Travis Vaught, and Jarrod Millman, editors, Proceedings of the 7th Python in Science Conference, pages 11-15, Pasadena, CA, 2008. Google Scholar
  18. Minghui Jiang. Recognizing d-interval graphs and d-track interval graphs. Algorithmica, 66(3):541-563, 2013. URL: https://doi.org/10.1007/s00453-012-9651-5.
  19. Johannes Köbler, Sebastian Kuhnert, and Osamu Watanabe. Interval graph representation with given interval and intersection lengths. J. Discrete Algorithms, 34:108-117, 2015. URL: https://doi.org/10.1016/j.jda.2015.05.011.
  20. Johannes Köster and Sven Rahmann. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics, 28(19):2520-2522, 2012. Google Scholar
  21. Ruibang Luo, Fritz J Sedlazeck, Charlotte A Darby, Stephen M Kelly, and Michael C Schatz. LRSim: a linked-reads simulator generating insights for better genome partitioning. Comput Struct Biotechnol J., 15:478-484, 2017. URL: https://doi.org/10.1016/j.csbj.2017.10.002.
  22. Pierre Marijon, Rayan Chikhi, and Jean-Stéphane Varré. yacrd and fpa: upstream tools for long-read genome assembly. Bioinformatics, advance access:btaa262, 2020. URL: https://doi.org/10.1093/bioinformatics/btaa262.
  23. Ross M. McConnell. Linear-time recognition of circular-arc graphs. Algorithmica, 37(2):93-147, 2003. URL: https://doi.org/10.1007/s00453-003-1032-7.
  24. Itsik Pe'er and Ron Shamir. Realizing interval graphs with size and distance constraints. SIAM J. Discret. Math., 10(4):662-687, 1997. URL: https://doi.org/10.1137/S0895480196306373.
  25. Ariya Shajii, Ibrahim Numanagić, and Bonnie Berger. Latent variable model for aligning barcoded short-reads improves downstream analyses. In Research in Computational Molecular Biology - 22nd Annual International Conference, RECOMB 2018, volume 10812 of Lecture Notes Comput. Sci., pages 280-282. Springer, 2018. URL: https://doi.org/10.1007/978-3-319-89929-9.
  26. Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci., 363(1):28-42, 2006. URL: https://doi.org/10.1016/j.tcs.2006.06.015.
  27. Yngve Villanger, Pinar Heggernes, Christophe Paul, and Jan Arne Telle. Interval completion is fixed parameter tractable. SIAM J. Comput., 38(5):2007-2020, 2009. URL: https://doi.org/10.1137/070710913.
  28. Ou Wang, Robert Chin, Xiaofang Cheng, Michelle Ka Yan Wu, Qing Mao, Jingbo Tang, Yuhui Sun, Ellis Anderson, Han K. Lam, Dan Chen, Yujun Zhou, Linying Wang, Fei Fan, Yan Zou, Yinlong Xie, Rebecca Yu Zhang, Snezana Drmanac, Darlene Nguyen, Chongjun Xu, Christian Villarosa, Scott Gablenz, Nina Barua, Staci Nguyen, Wenlan Tian, Jia Sophie Liu, Jingwan Wang, Xiao Liu, Xiaojuan Qi, Ao Chen, He Wang, Yuliang Dong, Wenwei Zhang, Andrei Alexeev, Huanming Yang, Jian Wang, Karsten Kristiansen, Xun Xu, Radoje Drmanac, and Brock A. Peters. Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly. Genome Res., 29(5):798-808, 2019. URL: https://doi.org/10.1101/gr.245126.118.
  29. Neil I Weisenfeld, Vijay Kumar, Preyas Shah, Deanna M Church, and David B Jaffe. Direct determination of diploid genome sequences. Genome Res., 27, 2017. URL: https://doi.org/10.1101/gr.214874.116.
  30. Douglas B. West and David B. Shmoys. Recognizing graphs with fixed interval number is NP-complete. Discret. Appl. Math., 8(3):295-305, 1984. URL: https://doi.org/10.1016/0166-218X(84)90127-6.
  31. Sarah Yeo, Lauren Coombe, René L Warren, Justin Chu, and Inanç Birol. ARCS: scaffolding genome drafts with linked reads. Bioinformatics, 34(5):725-731, 2017. URL: https://doi.org/10.1093/bioinformatics/btx675.
  32. Fan Zhang, Lena Christiansen, Jerushah Thomas, Dmitry Pokholok, Ros Jackson, Natalie Morrell, Yannan Zhao, Melissa Wiley, Emily Welch, Erich Jaeger, Ana Granat, Steven J. Norberg, Aaron Halpern, Maria C Rogert, Mostafa Ronaghi, Jay Shendure, Niall Gormley, Kevin L. Gunderson, and Frank J. Steemers. Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube. Nat. Biotechnol., 35(9):852-857, September 2017. URL: https://doi.org/10.1038/nbt.3897.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail