A Graph-Theoretic Barcode Ordering Model for Linked-Reads

Authors Yoann Dufresne , Chen Sun, Pierre Marijon , Dominique Lavenier , Cedric Chauve , Rayan Chikhi

Author Details

Yoann Dufresne
  • Department of Computational Biology, C3BI USR 3756 CNRS, Institut Pasteur, Paris, France
Chen Sun
  • Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA
Pierre Marijon
  • Center for Bioinformatics, Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Dominique Lavenier
  • IRISA, Inria, Université de Rennes, France
Cedric Chauve
  • Department of Mathematics, Simon Fraser University, Burnaby, Canada
  • LaBRI, Université de Bordeaux, France
Rayan Chikhi
  • Department of Computational Biology, C3BI USR 3756 CNRS, Institut Pasteur, Paris, France


The authors are grateful to Paul Medvedev and Jean-Stéphane Varré for preliminary work and discussions and to Marthe Bonamy for discussions about interval graphs models.

Yoann Dufresne, Chen Sun, Pierre Marijon, Dominique Lavenier, Cedric Chauve, and Rayan Chikhi. A Graph-Theoretic Barcode Ordering Model for Linked-Reads. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 11:1-11:17, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2020)


Considering a set of intervals on the real line, an interval graph records these intervals as nodes and their intersections as edges. Identifying (i.e. merging) pairs of nodes in an interval graph results in a multiple-interval graph. Given only the nodes and the edges of the multiple-interval graph without knowing the underlying intervals, we are interested in the following questions. Can one determine how many intervals correspond to each node? Can one compute a walk over the multiple-interval graph nodes that reflects the ordering of the original intervals? These questions are closely related to linked-read DNA sequencing, where barcodes are assigned to long molecules whose intersection graph forms an interval graph. Each barcode may correspond to multiple molecules, which complicates downstream analysis, and corresponds to the identification of nodes of the corresponding interval graph. Resolving the above graph-theoretic problems would facilitate analyses of linked-reads sequencing data, through enabling the conceptual separation of barcodes into molecules and providing, through the molecules order, a skeleton for accurately assembling the genome. Here, we propose a framework that takes as input an arbitrary intersection graph (such as an overlap graph of barcodes) and constructs a heuristic approximation of the ordering of the original intervals.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
  • DNA sequencing
  • graph algorithms
  • linked-reads
  • interval graphs
  • cliques


