Minimum Common String Partition: Exact Algorithms

Authors Marek Cygan , Alexander S. Kulikov , Ivan Mihajlin, Maksim Nikolaev , Grigory Reznikov

Marek Cygan
  • University of Warsaw, Poland
Alexander S. Kulikov
  • Steklov Mathematical Institute at St. Petersburg, Russian Academy of Sciences, Russia
  • St. Petersburg State University, Russia
Ivan Mihajlin
  • Steklov Mathematical Institute at St. Petersburg, Russian Academy of Sciences, Russia
Maksim Nikolaev
  • Steklov Mathematical Institute at St. Petersburg, Russian Academy of Sciences, Russia
Grigory Reznikov
  • National Research University Higher School of Economics, St. Petersburg, Russia

Marek Cygan, Alexander S. Kulikov, Ivan Mihajlin, Maksim Nikolaev, and Grigory Reznikov. Minimum Common String Partition: Exact Algorithms. In 29th Annual European Symposium on Algorithms (ESA 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 204, pp. 35:1-35:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)


In the minimum common string partition problem (MCSP), one gets two strings and is asked to find the minimum number of cuts in the first string such that the second string can be obtained by rearranging the resulting pieces. It is a difficult algorithmic problem having applications in computational biology, text processing, and data compression. MCSP has been studied extensively from various algorithmic angles: there are many papers studying approximation, heuristic, and parameterized algorithms. At the same time, almost nothing is known about its exact complexity. In this paper, we present new results in this direction. We improve the known 2ⁿ upper bound (where n is the length of input strings) to ϕⁿ where ϕ ≈ 1.618... is the golden ratio. The algorithm uses Fibonacci numbers to encode subsets as monomials of a certain implicit polynomial and extracts one of its coefficients using the fast Fourier transform. Then, we show that the case of constant size alphabet can be solved in subexponential time 2^{O(nlog log n/log n)} by a hybrid strategy: enumerate all long pieces and use dynamic programming over histograms of short pieces. Finally, we prove almost matching lower bounds assuming the Exponential Time Hypothesis.

  • Theory of computation → Parameterized complexity and exact algorithms
  • Theory of computation → Algorithm design techniques
  • similarity measure
  • string distance
  • exact algorithms
  • upper bounds
  • lower bounds


