Applying the Safe-And-Complete Framework to Practical Genome Assembly

Authors Sebastian Schmidt , Santeri Toivonen, Paul Medvedev, Alexandru I. Tomescu

Document Identifiers

Author Details

Sebastian Schmidt
  • Department of Computer Science, University of Helsinki, Finland
Santeri Toivonen
  • Department of Computer Science, University of Helsinki, Finland
Paul Medvedev
  • Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA
  • Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
  • Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
Alexandru I. Tomescu
  • Department of Computer Science, University of Helsinki, Finland


PM would like to thank John Hutton for early attempts to extend omnitigs to work in practice [Hutton, 2018]. The authors wish to thank the Finnish Computing Competence Infrastructure (FCCI) for supporting this project with computational and data storage resources.

Cite AsGet BibTex

Sebastian Schmidt, Santeri Toivonen, Paul Medvedev, and Alexandru I. Tomescu. Applying the Safe-And-Complete Framework to Practical Genome Assembly. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 8:1-8:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Despite the long history of genome assembly research, there remains a large gap between the theoretical and practical work. There is practical software with little theoretical underpinning of accuracy on one hand and theoretical algorithms which have not been adopted in practice on the other. In this paper we attempt to bridge the gap between theory and practice by showing how the theoretical safe-and-complete framework can be integrated into existing assemblers in order to improve contiguity. The optimal algorithm in this framework, called the omnitig algorithm, has not been used in practice due to its complexity and its lack of robustness to real data. Instead, we pursue a simplified notion of omnitigs (simple omnitigs), giving an efficient algorithm to compute them and demonstrating their safety under certain conditions. We modify two assemblers (wtdbg2 and Flye) by replacing their unitig algorithm with the simple omnitig algorithm. We test our modifications using real HiFi data from the D. melanogaster and the C. elegans genomes. Our modified algorithms lead to a substantial improvement in alignment-based contiguity, with negligible additional computational costs and either no or a small increase in the number of misassemblies.

Subject Classification

ACM Subject Classification
  • Applied computing → Computational biology
  • Mathematics of computing → Paths and connectivity problems
  • Theory of computation → Graph algorithms analysis
  • Genome assembly
  • Omnitigs
  • Safe-and-complete framework
  • graph algorithm
  • HiFi sequencing data
  • Assembly evaluation


