b-move: Faster Bidirectional Character Extensions in a Run-Length Compressed Index

Authors Lore Depuydt , Luca Renders , Simon Van de Vyver , Lennart Veys, Travis Gagie , Jan Fostier

Lore Depuydt
  • Ghent University - imec, Belgium
Luca Renders
  • Ghent University - imec, Belgium
Simon Van de Vyver
  • Ghent University, Belgium
Lennart Veys
  • Ghent University, Belgium
Travis Gagie
  • Dalhousie University, Halifax, Canada
Jan Fostier
  • Ghent University - imec, Belgium


The authors thank Ben Langmead, Nathaniel Brown, and Mohsen Zakeri for their helpful feedback and suggestions.

Lore Depuydt, Luca Renders, Simon Van de Vyver, Lennart Veys, Travis Gagie, and Jan Fostier. b-move: Faster Bidirectional Character Extensions in a Run-Length Compressed Index. In 24th International Workshop on Algorithms in Bioinformatics (WABI 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 312, pp. 10:1-10:18, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)


Due to the increasing availability of high-quality genome sequences, pan-genomes are gradually replacing single consensus reference genomes in many bioinformatics pipelines to better capture genetic diversity. Traditional bioinformatics tools using the FM-index face memory limitations with such large genome collections. Recent advancements in run-length compressed indices like Gagie et al.’s r-index and Nishimoto and Tabei’s move structure, alleviate memory constraints but focus primarily on backward search for MEM-finding. Arakawa et al.’s br-index initiates complete approximate pattern matching using bidirectional search in run-length compressed space, but with significant computational overhead due to complex memory access patterns. We introduce b-move, a novel bidirectional extension of the move structure, enabling fast, cache-efficient bidirectional character extensions in run-length compressed space. It achieves bidirectional character extensions up to 8 times faster than the br-index, closing the performance gap with FM-index-based alternatives, while maintaining the br-index’s favorable memory characteristics. For example, all available complete E. coli genomes on NCBI’s RefSeq collection can be compiled into a b-move index that fits into the RAM of a typical laptop. Thus, b-move proves practical and scalable for pan-genome indexing and querying. We provide a C++ implementation of b-move, supporting efficient lossless approximate pattern matching including locate functionality, available at https://github.com/biointec/b-move under the AGPL-3.0 license.

  • Applied computing → Bioinformatics
  • Pan-genomics
  • FM-index
  • r-index
  • Move Structure
  • Bidirectional Search
  • Approximate Pattern Matching
  • Lossless Alignment
  • Cache Efficiency


