Document

**Published in:** LIPIcs, Volume 172, 20th International Workshop on Algorithms in Bioinformatics (WABI 2020)

Considering a set of intervals on the real line, an interval graph records these intervals as nodes and their intersections as edges. Identifying (i.e. merging) pairs of nodes in an interval graph results in a multiple-interval graph. Given only the nodes and the edges of the multiple-interval graph without knowing the underlying intervals, we are interested in the following questions. Can one determine how many intervals correspond to each node? Can one compute a walk over the multiple-interval graph nodes that reflects the ordering of the original intervals? These questions are closely related to linked-read DNA sequencing, where barcodes are assigned to long molecules whose intersection graph forms an interval graph. Each barcode may correspond to multiple molecules, which complicates downstream analysis, and corresponds to the identification of nodes of the corresponding interval graph. Resolving the above graph-theoretic problems would facilitate analyses of linked-reads sequencing data, through enabling the conceptual separation of barcodes into molecules and providing, through the molecules order, a skeleton for accurately assembling the genome. Here, we propose a framework that takes as input an arbitrary intersection graph (such as an overlap graph of barcodes) and constructs a heuristic approximation of the ordering of the original intervals.

Yoann Dufresne, Chen Sun, Pierre Marijon, Dominique Lavenier, Cedric Chauve, and Rayan Chikhi. A Graph-Theoretic Barcode Ordering Model for Linked-Reads. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 11:1-11:17, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{dufresne_et_al:LIPIcs.WABI.2020.11, author = {Dufresne, Yoann and Sun, Chen and Marijon, Pierre and Lavenier, Dominique and Chauve, Cedric and Chikhi, Rayan}, title = {{A Graph-Theoretic Barcode Ordering Model for Linked-Reads}}, booktitle = {20th International Workshop on Algorithms in Bioinformatics (WABI 2020)}, pages = {11:1--11:17}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-161-0}, ISSN = {1868-8969}, year = {2020}, volume = {172}, editor = {Kingsford, Carl and Pisanti, Nadia}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2020.11}, URN = {urn:nbn:de:0030-drops-128001}, doi = {10.4230/LIPIcs.WABI.2020.11}, annote = {Keywords: DNA sequencing, graph algorithms, linked-reads, interval graphs, cliques} }

Document

**Published in:** LIPIcs, Volume 113, 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)

Variable-Number Tandem Repeats (VNTR) are genomic regions where a short sequence of DNA is repeated with no space in between repeats. While a fixed set of VNTRs is typically identified for a given species, the copy number at each VNTR varies between individuals within a species. Although VNTRs are found in both prokaryotic and eukaryotic genomes, the methodology called multi-locus VNTR analysis (MLVA) is widely used to distinguish different strains of bacteria, as well as cluster strains that might be epidemiologically related and investigate evolutionary rates.
We propose PRINCE (Processing Reads to Infer the Number of Copies via Estimation), an algorithm that is able to accurately estimate the copy number of a VNTR given the sequence of a single repeat unit and a set of short reads from a whole-genome sequence (WGS) experiment. This is a challenging problem, especially in the cases when the repeat region is longer than the expected read length. Our proposed method computes a statistical approximation of the local coverage inside the repeat region. This approximation is then mapped to the copy number using a linear function whose parameters are fitted to simulated data. We test PRINCE on the genomes of three datasets of Mycobacterium tuberculosis strains and show that it is more than twice as accurate as a previous method.
An implementation of PRINCE in the Python language is freely available at https://github.com/WGS-TB/PythonPRINCE.

Mehrdad Mansouri, Julian Booth, Margaryta Vityaz, Cedric Chauve, and Leonid Chindelevitch. PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 20:1-20:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)

Copy BibTex To Clipboard

@InProceedings{mansouri_et_al:LIPIcs.WABI.2018.20, author = {Mansouri, Mehrdad and Booth, Julian and Vityaz, Margaryta and Chauve, Cedric and Chindelevitch, Leonid}, title = {{PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats}}, booktitle = {18th International Workshop on Algorithms in Bioinformatics (WABI 2018)}, pages = {20:1--20:13}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-082-8}, ISSN = {1868-8969}, year = {2018}, volume = {113}, editor = {Parida, Laxmi and Ukkonen, Esko}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, address = {Dagstuhl, Germany}, URL = {https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.WABI.2018.20}, URN = {urn:nbn:de:0030-drops-93227}, doi = {10.4230/LIPIcs.WABI.2018.20}, annote = {Keywords: Variable-Number Tandem Repeats, Copy number, Bacterial genomics} }

X

Feedback for Dagstuhl Publishing

Feedback submitted

Please try again later or send an E-mail