Document

# Collapsing Superstring Conjecture

## File

LIPIcs.APPROX-RANDOM.2019.26.pdf
• Filesize: 0.63 MB
• 23 pages

## Cite As

Alexander Golovnev, Alexander S. Kulikov, Alexander Logunov, Ivan Mihajlin, and Maksim Nikolaev. Collapsing Superstring Conjecture. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 145, pp. 26:1-26:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2019.26

## Abstract

In the Shortest Common Superstring (SCS) problem, one is given a collection of strings, and needs to find a shortest string containing each of them as a substring. SCS admits 2 11/23-approximation in polynomial time (Mucha, SODA'13). While this algorithm and its analysis are technically involved, the 30 years old Greedy Conjecture claims that the trivial and efficient Greedy Algorithm gives a 2-approximation for SCS. We develop a graph-theoretic framework for studying approximation algorithms for SCS. The framework is reminiscent of the classical 2-approximation for Traveling Salesman: take two copies of an optimal solution, apply a trivial edge-collapsing procedure, and get an approximate solution. In this framework, we observe two surprising properties of SCS solutions, and we conjecture that they hold for all input instances. The first conjecture, that we call Collapsing Superstring conjecture, claims that there is an elementary way to transform any solution repeated twice into the same graph G. This conjecture would give an elementary 2-approximate algorithm for SCS. The second conjecture claims that not only the resulting graph G is the same for all solutions, but that G can be computed by an elementary greedy procedure called Greedy Hierarchical Algorithm. While the second conjecture clearly implies the first one, perhaps surprisingly we prove their equivalence. We support these equivalent conjectures by giving a proof for the special case where all input strings have length at most 3 (which until recently had been the only case where the Greedy Conjecture was proven). We also tested our conjectures on millions of instances of SCS. We prove that the standard Greedy Conjecture implies Greedy Hierarchical Conjecture, while the latter is sufficient for an efficient greedy 2-approximate approximation of SCS. Except for its (conjectured) good approximation ratio, the Greedy Hierarchical Algorithm provably finds a 3.5-approximation, and finds exact solutions for the special cases where we know polynomial time (not greedy) exact algorithms: (1) when the input strings form a spectrum of a string (2) when all input strings have length at most 2.

## Subject Classification

##### ACM Subject Classification
• Theory of computation → Design and analysis of algorithms
• Theory of computation → Approximation algorithms analysis
##### Keywords
• superstring
• shortest common superstring
• approximation
• greedy algorithms
• greedy conjecture

## Metrics

• Access Statistics
• Total Accesses (updated on a weekly basis)
0

## References

1. Eric Bax and Joel Franklin. A Finite-Difference Sieve to Count Paths and Cycles by Length. Inf. Process. Lett., 60:171-176, 1996.
2. Richard Bellman. Dynamic Programming Treatment of the Travelling Salesman Problem. J. ACM, 9:61-63, 1962.
3. Avrim Blum, Tao Jiang, Ming Li, John Tromp, and Mihalis Yannakakis. Linear approximation of shortest superstrings. In STOC 1991, pages 328-336. ACM, 1991.
4. Bastien Cazaux, Samuel Juhel, and Eric Rivals. Practical lower and upper bounds for the Shortest Linear Superstring. In SEA 2018, volume 103, pages 18:1-18:14. LIPIcs, 2018.
5. Bastien Cazaux and Eric Rivals. A linear time algorithm for Shortest Cyclic Cover of Strings. J. Discrete Algorithms, 37:56-67, 2016.
6. Bastien Cazaux and Eric Rivals. Hierarchical Overlap Graph. arXiv preprint, 2018. URL: http://arxiv.org/abs/1802.04632.
7. Bastien Cazaux and Eric Rivals. Relationship between superstring and compression measures: New insights on the greedy conjecture. Discrete Appl. Math., 245:59-64, 2018.
8. John Gallant. String compression algorithms. PhD thesis, Princeton, 1982.
9. John Gallant, David Maier, and James A. Storer. On finding minimal length superstrings. J. Comput. Syst. Sci., 20(1):50-58, 1980.
10. Theodoros P. Gevezes and Leonidas S. Pitsoulis. The shortest superstring problem, pages 189-227. Springer, 2014.
11. Collapsing Superstring Conjecture. GitHub repository. https://github.com/alexanderskulikov/greedy-superstring-conjecture, 2018.
12. Alexander Golovnev, Alexander S. Kulikov, and Ivan Mihajlin. Approximating shortest superstring problem using de Bruijn graphs. In CPM 2013, pages 120-129. Springer, 2013.
13. Alexander Golovnev, Alexander S. Kulikov, and Ivan Mihajlin. Solving SCS for bounded length strings in fewer than 2ⁿ steps. Inf. Process. Lett., 114(8):421-425, 2014.
14. Michael Held and Richard M. Karp. The Traveling-Salesman Problem and Minimum Spanning Trees. Math. Program., 1:6-25, 1971.
15. Haim Kaplan and Nira Shafrir. The greedy algorithm for shortest superstrings. Inf. Process. Lett., 93(1):13-17, 2005.
16. Richard M. Karp. Dynamic Programming Meets the Principle of Inclusion and Exclusion. Oper. Res. Lett., 1(2):49-51, 1982.
17. Samuel Kohn, Allan Gottlieb, and Meryle Kohn. A Generating Function Approach to the Traveling Salesman Problem. In ACN 1977, pages 294-300, 1977.
18. Alexander S. Kulikov, Sergey Savinov, and Evgeniy Sluzhaev. Greedy conjecture for strings of length 4. In CPM 2015, pages 307-315. Springer, 2015.
19. Uli Laube and Maik Weinard. Conditional inequalities and the shortest common superstring problem. Int. J. Found. Comput. Sci., 16(06):1219-1230, 2005.
20. Gaspard Monge. Mémoire sur la théorie des déblais et des remblais. Histoire de l'Académie Royale des Sciences de Paris, 1781.
21. Marcin Mucha. A tutorial on shortest superstring approximation, 2007.
22. Marcin Mucha. Lyndon Words and Short Superstrings. In SODA 2013, pages 958-972. SIAM, 2013.
23. Katarzyna Paluch. Better approximation algorithms for maximum asymmetric traveling salesman and shortest superstring. arXiv preprint, 2014. URL: http://arxiv.org/abs/1401.3670.
24. Pavel A. Pevzner, Haixu Tang, and Michael S. Waterman. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A., 98(17):9748-9753, 2001.
25. Eric Rivals and Bastien Cazaux. Superstrings with multiplicities. In CPM 2018, volume 105, pages 21:1-21:16, 2018.
26. Heidi J. Romero, Carlos A. Brizuela, and Andrei Tchernykh. An experimental comparison of two approximation algorithms for the common superstring problem. In ENC 2004, pages 27-34. IEEE, 2004.
27. Sartaj Sahni and Teofilo Gonzalez. P-Complete Approximation Problems. J. ACM, 23:555-565, 1976.
28. James A. Storer. Data compression: methods and theory. Computer Science Press, Inc., 1987.
29. Ola Svensson, Jakub Tarnawski, and László A. Végh. A constant-factor approximation algorithm for the asymmetric traveling salesman problem. In STOC 2018, pages 204-213. ACM, 2018.
30. Jorma Tarhio and Esko Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci., 57(1):131-145, 1988.
31. Jonathan S. Turner. Approximation algorithms for the shortest common superstring problem. Inf. Comput., 83(1):1-20, 1989.
32. Esko Ukkonen. A linear-time algorithm for finding approximate shortest common superstrings. Algorithmica, 5(1-4):313-323, 1990.
33. Michael S. Waterman. Introduction to computational biology: maps, sequences and genomes. CRC Press, 1995.
34. Collapsing Superstring Conjecture. Webpage. http://compsciclub.ru/scs/, 2018.
35. Maik Weinard and Georg Schnitger. On the greedy superstring conjecture. SIAM J. Discrete Math., 20(2):502-522, 2006.