Practical lower and upper bounds for the Shortest Linear Superstring

Authors Bastien Cazaux, Samuel Juhel, Eric Rivals



PDF
Thumbnail PDF

File

LIPIcs.SEA.2018.18.pdf
  • Filesize: 0.61 MB
  • 14 pages

Document Identifiers

Author Details

Bastien Cazaux
  • Department of Computer Science, University of Helsinki, Helsinki, Finland
  • , L.I.R.M.M., CNRS, Université Montpellier, Montpellier, France , Institute of Computational Biology, Montpellier, France
Samuel Juhel
  • L.I.R.M.M., CNRS, Université Montpellier, Montpellier, France , Institute of Computational Biology, Montpellier, France
Eric Rivals
  • L.I.R.M.M., CNRS, Université Montpellier, Montpellier, France, Institute of Computational Biology, Montpellier, France

Cite As Get BibTex

Bastien Cazaux, Samuel Juhel, and Eric Rivals. Practical lower and upper bounds for the Shortest Linear Superstring. In 17th International Symposium on Experimental Algorithms (SEA 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 103, pp. 18:1-18:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.SEA.2018.18

Abstract

Given a set P of words, the Shortest Linear Superstring (SLS) problem is an optimisation problem that asks for a superstring of P of minimal length. SLS has applications in data compression, where a superstring is a compact representation of P, and in bioinformatics where it models the first step of genome assembly. Unfortunately SLS is hard to solve (NP-hard) and to closely approximate (MAX-SNP-hard). If numerous polynomial time approximation algorithms have been devised, few articles report on their practical performance. We lack knowledge about how closely an approximate superstring can be from an optimal one in practice. Here, we exhibit a linear time algorithm that reports an upper and a lower bound on the length of an optimal superstring. The upper bound is the length of an approximate superstring. This algorithm can be used to evaluate beforehand whether one can get an approximate superstring whose length is close to the optimum for a given instance. Experimental results suggest that its approximation performance is orders of magnitude better than previously reported practical values. Moreover, the proposed algorithm remainso efficient even on large instances and can serve to explore in practice the approximability of SLS.

Subject Classification

ACM Subject Classification
  • Mathematics of computing → Combinatorial optimization
  • Theory of computation → Discrete optimization
Keywords
  • greedy
  • approximation
  • overlap
  • Concat-Cycles
  • cyclic cover
  • linear time
  • text compression

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search. Communincations of ACM, 18(6):333-340, 1975. Google Scholar
  2. Avrim Blum, Tao Jiang, Ming Li, John Tromp, and Mihalis Yannakakis. Linear approximation of shortest superstrings. Journal of the ACM, 41(4):630-647, 1994. URL: http://dx.doi.org/10.1145/179812.179818.
  3. Rodrigo Cánovas, Bastien Cazaux, and Eric Rivals. The Compressed Overlap Index. ArXiv e-prints, abs/1707.05613, 2017. URL: http://arxiv.org/abs/1707.05613.
  4. B. Cazaux, R. Cánovas, and E. Rivals. Shortest DNA cyclic cover in compressed space. In Data Compression Conference DCC, pages 536-545. IEEE Computer Society Press, 2016. Google Scholar
  5. B. Cazaux and E. Rivals. Hierarchical Overlap Graph. ArXiv e-prints, 1802.04632v2, 2018. URL: https://arxiv.org/abs/1802.04632v2.
  6. Bastien Cazaux and Eric Rivals. A linear time algorithm for Shortest Cyclic Cover of Strings. J. of Discrete Algorithms, 37:56-67, 2016. http://dx.doi.org/10.1016/j.jda.2016.05.001. Google Scholar
  7. Bastien Cazaux and Eric Rivals. Superstrings with multiplicities. In David Sankoff Gonzalo Navarro and Binhai Zhu, editors, 29th Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018, Qingdao, China, volume 105 of LIPIcs, page in press, 2018. URL: http://dx.doi.org/10.4230/LIPIcs.CPM.2017.5.
  8. Maxime Crochemore, Marek Cygan, Costas S. Iliopoulos, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Algorithms for three versions of the shortest common superstring problem. In Proc. of 21st Annual Symp. Combinatorial Pattern Matching (CPM), volume 6129 of Lecture Notes in Computer Science, pages 299-309. Springer, 2010. URL: http://dx.doi.org/10.1007/978-3-642-13509-5_27.
  9. Gabriele Fici, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. On the greedy algorithm for the Shortest Common Superstring problem with reversals. Information Processing Letters, 116(3):245-251, 2016. Google Scholar
  10. Theodoros P. Gevezes and Leonidas S. Pitsoulis. Optimization in Science and Engineering: In Honor of the 60th Birthday of Panos M. Pardalos, chapter The Shortest Superstring Problem, pages 189-227. Springer New York, New York, NY, 2014. URL: http://dx.doi.org/10.1007/978-1-4939-0808-0_10.
  11. A. Golovnev, A.S. Kulikov, and I Mihajlin. Approximating Shortest Superstring Problem Using de Bruijn Graphs. In Combinatorial Pattern Matching, volume 7922 of Lecture Notes in Computer Science, pages 120-129. Springer Verlag, 2013. Google Scholar
  12. D. Gusfield. Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1997. Google Scholar
  13. Tao Jiang, Ming Li, and Ding-Zhu Du. A note on shortest superstrings with flipping. Information Processing Letters, 44(4):195-199, 1992. Google Scholar
  14. John D. Kececioglu and Eugene W. Myers. Combinatiorial algorithms for DNA sequence assembly. Algorithmica, 13(1/2):7-51, 1995. Google Scholar
  15. Bin Ma. Why greed works for shortest common superstring problem. Theoretical Computer Science, 410(51):5374-5381, 2009. Google Scholar
  16. Marcin Mucha. Lyndon words and short superstrings. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 958-972, 2013. Google Scholar
  17. Katarzyna E. Paluch. Better Approximation Algorithms for Maximum Asymmetric Traveling Salesman and Shortest Superstring. CoRR, abs/1401.3670, 2014. URL: http://arxiv.org/abs/1401.3670.
  18. Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial optimization : algorithms and complexity. Dover Publications, Inc., 2nd edition, 1998. 496 p. Google Scholar
  19. Heidi J. Romero, Carlos A. Brizuela, and Andrei Tchernykh. An Experimental Comparison of Approximation Algorithms for the Shortest Common Superstring Problem. In 5th Mexican International Conference on Computer Science, pages 27-34, 2004. Google Scholar
  20. Jorma Tarhio and Esko Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theoretical Computer Science, 57:131-145, 1988. URL: http://dx.doi.org/10.1016/0304-3975(88)90167-3.
  21. Esko Ukkonen. A Linear-Time Algorithm for Finding Approximate Shortest Common Superstrings. Algorithmica, 5:313-323, 1990. Google Scholar
  22. Virginia Vassilevska. Explicit inapproximability bounds for the shortest superstring problem. In 30th Int. Symp. on Mathematical Foundations of Computer Science (MFCS), volume 3618 of Lecture Notes in Computer Science, pages 793-800. Springer, 2005. Google Scholar
  23. Maik Weinard and Georg Schnitger. On the greedy superstring conjecture. SIAM Journal on Discrete Mathematics, 20(2):502-522, 2006. URL: http://dx.doi.org/10.1137/040619144.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail