Collapsing Superstring Conjecture

Authors Alexander Golovnev, Alexander S. Kulikov, Alexander Logunov, Ivan Mihajlin, Maksim Nikolaev

Thumbnail PDF


  • Filesize: 0.63 MB
  • 23 pages

Document Identifiers

Author Details

Alexander Golovnev
  • Harvard University, Cambridge, MA, USA
Alexander S. Kulikov
  • Steklov Institute of Mathematics at St. Petersburg, Russian Academy of Sciences, Russia
Alexander Logunov
  • St. Petersburg State University, Russia
Ivan Mihajlin
  • University of California, San Diego, CA, USA
Maksim Nikolaev
  • St. Petersburg State University, Russia

Cite AsGet BibTex

Alexander Golovnev, Alexander S. Kulikov, Alexander Logunov, Ivan Mihajlin, and Maksim Nikolaev. Collapsing Superstring Conjecture. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 26:1-26:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


In the Shortest Common Superstring (SCS) problem, one is given a collection of strings, and needs to find a shortest string containing each of them as a substring. SCS admits 2 11/23-approximation in polynomial time (Mucha, SODA'13). While this algorithm and its analysis are technically involved, the 30 years old Greedy Conjecture claims that the trivial and efficient Greedy Algorithm gives a 2-approximation for SCS. We develop a graph-theoretic framework for studying approximation algorithms for SCS. The framework is reminiscent of the classical 2-approximation for Traveling Salesman: take two copies of an optimal solution, apply a trivial edge-collapsing procedure, and get an approximate solution. In this framework, we observe two surprising properties of SCS solutions, and we conjecture that they hold for all input instances. The first conjecture, that we call Collapsing Superstring conjecture, claims that there is an elementary way to transform any solution repeated twice into the same graph G. This conjecture would give an elementary 2-approximate algorithm for SCS. The second conjecture claims that not only the resulting graph G is the same for all solutions, but that G can be computed by an elementary greedy procedure called Greedy Hierarchical Algorithm. While the second conjecture clearly implies the first one, perhaps surprisingly we prove their equivalence. We support these equivalent conjectures by giving a proof for the special case where all input strings have length at most 3 (which until recently had been the only case where the Greedy Conjecture was proven). We also tested our conjectures on millions of instances of SCS. We prove that the standard Greedy Conjecture implies Greedy Hierarchical Conjecture, while the latter is sufficient for an efficient greedy 2-approximate approximation of SCS. Except for its (conjectured) good approximation ratio, the Greedy Hierarchical Algorithm provably finds a 3.5-approximation, and finds exact solutions for the special cases where we know polynomial time (not greedy) exact algorithms: (1) when the input strings form a spectrum of a string (2) when all input strings have length at most 2.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
  • Theory of computation → Approximation algorithms analysis
  • superstring
  • shortest common superstring
  • approximation
  • greedy algorithms
  • greedy conjecture


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Eric Bax and Joel Franklin. A Finite-Difference Sieve to Count Paths and Cycles by Length. Inf. Process. Lett., 60:171-176, 1996. Google Scholar
  2. Richard Bellman. Dynamic Programming Treatment of the Travelling Salesman Problem. J. ACM, 9:61-63, 1962. Google Scholar
  3. Avrim Blum, Tao Jiang, Ming Li, John Tromp, and Mihalis Yannakakis. Linear approximation of shortest superstrings. In STOC 1991, pages 328-336. ACM, 1991. Google Scholar
  4. Bastien Cazaux, Samuel Juhel, and Eric Rivals. Practical lower and upper bounds for the Shortest Linear Superstring. In SEA 2018, volume 103, pages 18:1-18:14. LIPIcs, 2018. Google Scholar
  5. Bastien Cazaux and Eric Rivals. A linear time algorithm for Shortest Cyclic Cover of Strings. J. Discrete Algorithms, 37:56-67, 2016. Google Scholar
  6. Bastien Cazaux and Eric Rivals. Hierarchical Overlap Graph. arXiv preprint, 2018. URL:
  7. Bastien Cazaux and Eric Rivals. Relationship between superstring and compression measures: New insights on the greedy conjecture. Discrete Appl. Math., 245:59-64, 2018. Google Scholar
  8. John Gallant. String compression algorithms. PhD thesis, Princeton, 1982. Google Scholar
  9. John Gallant, David Maier, and James A. Storer. On finding minimal length superstrings. J. Comput. Syst. Sci., 20(1):50-58, 1980. Google Scholar
  10. Theodoros P. Gevezes and Leonidas S. Pitsoulis. The shortest superstring problem, pages 189-227. Springer, 2014. Google Scholar
  11. Collapsing Superstring Conjecture. GitHub repository., 2018.
  12. Alexander Golovnev, Alexander S. Kulikov, and Ivan Mihajlin. Approximating shortest superstring problem using de Bruijn graphs. In CPM 2013, pages 120-129. Springer, 2013. Google Scholar
  13. Alexander Golovnev, Alexander S. Kulikov, and Ivan Mihajlin. Solving SCS for bounded length strings in fewer than 2ⁿ steps. Inf. Process. Lett., 114(8):421-425, 2014. Google Scholar
  14. Michael Held and Richard M. Karp. The Traveling-Salesman Problem and Minimum Spanning Trees. Math. Program., 1:6-25, 1971. Google Scholar
  15. Haim Kaplan and Nira Shafrir. The greedy algorithm for shortest superstrings. Inf. Process. Lett., 93(1):13-17, 2005. Google Scholar
  16. Richard M. Karp. Dynamic Programming Meets the Principle of Inclusion and Exclusion. Oper. Res. Lett., 1(2):49-51, 1982. Google Scholar
  17. Samuel Kohn, Allan Gottlieb, and Meryle Kohn. A Generating Function Approach to the Traveling Salesman Problem. In ACN 1977, pages 294-300, 1977. Google Scholar
  18. Alexander S. Kulikov, Sergey Savinov, and Evgeniy Sluzhaev. Greedy conjecture for strings of length 4. In CPM 2015, pages 307-315. Springer, 2015. Google Scholar
  19. Uli Laube and Maik Weinard. Conditional inequalities and the shortest common superstring problem. Int. J. Found. Comput. Sci., 16(06):1219-1230, 2005. Google Scholar
  20. Gaspard Monge. Mémoire sur la théorie des déblais et des remblais. Histoire de l'Académie Royale des Sciences de Paris, 1781. Google Scholar
  21. Marcin Mucha. A tutorial on shortest superstring approximation, 2007. Google Scholar
  22. Marcin Mucha. Lyndon Words and Short Superstrings. In SODA 2013, pages 958-972. SIAM, 2013. Google Scholar
  23. Katarzyna Paluch. Better approximation algorithms for maximum asymmetric traveling salesman and shortest superstring. arXiv preprint, 2014. URL:
  24. Pavel A. Pevzner, Haixu Tang, and Michael S. Waterman. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A., 98(17):9748-9753, 2001. Google Scholar
  25. Eric Rivals and Bastien Cazaux. Superstrings with multiplicities. In CPM 2018, volume 105, pages 21:1-21:16, 2018. Google Scholar
  26. Heidi J. Romero, Carlos A. Brizuela, and Andrei Tchernykh. An experimental comparison of two approximation algorithms for the common superstring problem. In ENC 2004, pages 27-34. IEEE, 2004. Google Scholar
  27. Sartaj Sahni and Teofilo Gonzalez. P-Complete Approximation Problems. J. ACM, 23:555-565, 1976. Google Scholar
  28. James A. Storer. Data compression: methods and theory. Computer Science Press, Inc., 1987. Google Scholar
  29. Ola Svensson, Jakub Tarnawski, and László A. Végh. A constant-factor approximation algorithm for the asymmetric traveling salesman problem. In STOC 2018, pages 204-213. ACM, 2018. Google Scholar
  30. Jorma Tarhio and Esko Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci., 57(1):131-145, 1988. Google Scholar
  31. Jonathan S. Turner. Approximation algorithms for the shortest common superstring problem. Inf. Comput., 83(1):1-20, 1989. Google Scholar
  32. Esko Ukkonen. A linear-time algorithm for finding approximate shortest common superstrings. Algorithmica, 5(1-4):313-323, 1990. Google Scholar
  33. Michael S. Waterman. Introduction to computational biology: maps, sequences and genomes. CRC Press, 1995. Google Scholar
  34. Collapsing Superstring Conjecture. Webpage., 2018.
  35. Maik Weinard and Georg Schnitger. On the greedy superstring conjecture. SIAM J. Discrete Math., 20(2):502-522, 2006. Google Scholar