The Longest Run Subsequence Problem

Authors Sven Schrinner , Manish Goel , Michael Wulfert, Philipp Spohr , Korbinian Schneeberger , Gunnar W. Klau



PDF
Thumbnail PDF

File

LIPIcs.WABI.2020.6.pdf
  • Filesize: 0.89 MB
  • 13 pages

Document Identifiers

Author Details

Sven Schrinner
  • Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Germany
Manish Goel
  • Max Planck Institute for Plant Breeding Research, Cologne, Germany
Michael Wulfert
  • Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Germany
Philipp Spohr
  • Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Germany
Korbinian Schneeberger
  • Max Planck Institute for Plant Breeding Research, Cologne, Germany
  • Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Germany
  • Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
Gunnar W. Klau
  • Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Germany
  • Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Germany

Cite AsGet BibTex

Sven Schrinner, Manish Goel, Michael Wulfert, Philipp Spohr, Korbinian Schneeberger, and Gunnar W. Klau. The Longest Run Subsequence Problem. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 6:1-6:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.WABI.2020.6

Abstract

Genome assembly is one of the most important problems in computational genomics. Here, we suggest addressing the scaffolding phase, in which contigs need to be linked and ordered to obtain larger pseudo-chromosomes, by means of a second incomplete assembly of a related species. The idea is to use alignments of binned regions in one contig to find the most homologous contig in the other assembly. We show that ordering the contigs of the other assembly can be expressed by a new string problem, the longest run subsequence problem (LRS). We show that LRS is NP-hard and present reduction rules and two algorithmic approaches that, together, are able to solve large instances of LRS to provable optimality. In particular, they can solve realistic instances resulting from partial Arabidopsis thaliana assemblies in short computation time. Our source code and all data used in the experiments are freely available.

Subject Classification

ACM Subject Classification
  • Theory of computation → Dynamic programming
  • Mathematics of computing → Combinatorial optimization
  • Applied computing → Computational genomics
Keywords
  • alignments
  • assembly
  • string algorithm
  • longest subsequence

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Michael Alonge, Sebastian Soyk, Srividya Ramakrishnan, Xingang Wang, Sara Goodwin, Fritz J. Sedlazeck, Zachary B. Lippman, and Michael C. Schatz. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biology, 20(1):224, October 2019. URL: https://doi.org/10.1186/s13059-019-1829-6.
  2. Joshua N. Burton, Andrew Adey, Rupali P. Patwardhan, Ruolan Qiu, Jacob O. Kitzman, and Jay Shendure. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature Biotechnology, 31(12):1119-1125, December 2013. URL: https://doi.org/10.1038/nbt.2727.
  3. Lauren Coombe, Vladimir Nikolić, Justin Chu, Inanc Birol, and René L Warren. ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs. Bioinformatics, April 2020. btaa253. URL: https://doi.org/10.1093/bioinformatics/btaa253.
  4. Manish Goel, Hequan Sun, Wen-Biao Jiao, and Korbinian Schneeberger. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biology, 20(1):277, December 2019. URL: https://doi.org/10.1186/s13059-019-1911-0.
  5. Martin Grötschel, Michael Jünger, and Gerhard Reinelt. A cutting plane algorithm for the linear ordering problem. Operations Research, 32:1195-1220, December 1984. URL: https://doi.org/10.1287/opre.32.6.1195.
  6. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408(6814):796-815, 2000. Google Scholar
  7. Wen-Biao Jiao, Gonzalo Garcia Accinelli, Benjamin Hartwig, Christiane Kiefer, David Baker, Edouard Severing, Eva-Maria Willing, Mathieu Piednoel, Stefan Woetzel, Eva Madrid-Herrero, Bruno Huettel, Ulrike Hümann, Richard Reinhard, Marcus A. Koch, Daniel Swan, Bernardo Clavijo, George Coupland, and Korbinian Schneeberger. Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Research, 27(5):778-786, May 2017. URL: https://doi.org/10.1101/gr.213652.116.
  8. Johannes Köster and Sven Rahmann. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics, 28(19):2520-2522, August 2012. URL: https://doi.org/10.1093/bioinformatics/bts480.
  9. Haibao Tang, Xingtan Zhang, Chenyong Miao, Jisen Zhang, Ray Ming, James C. Schnable, Patrick S. Schnable, Eric Lyons, and Jianguo Lu. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biology, 16(1):3, January 2015. URL: https://doi.org/10.1186/s13059-014-0573-1.
  10. Neil I. Weisenfeld, Vijay Kumar, Preyas Shah, Deanna M. Church, and David B. Jaffe. Direct determination of diploid genome sequences. Genome Research, 27(5):757-767, 2017. URL: https://doi.org/10.1101/gr.214874.116.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail