A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs

Authors Sangsoo Park , Sung Gwan Park , Bastien Cazaux , Kunsoo Park , Eric Rivals



PDF
Thumbnail PDF

File

LIPIcs.CPM.2021.22.pdf
  • Filesize: 0.86 MB
  • 9 pages

Document Identifiers

Author Details

Sangsoo Park
  • Samsung Electronics, Seoul, Korea
Sung Gwan Park
  • Samsung Electronics, Seoul, Korea
Bastien Cazaux
  • LIRMM, Université Montpellier, CNRS, Montpellier, France
Kunsoo Park
  • Seoul National University, Seoul, Korea
Eric Rivals
  • LIRMM, Université Montpellier, CNRS, Montpellier, France

Cite As Get BibTex

Sangsoo Park, Sung Gwan Park, Bastien Cazaux, Kunsoo Park, and Eric Rivals. A Linear Time Algorithm for Constructing Hierarchical Overlap Graphs. In 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 191, pp. 22:1-22:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021) https://doi.org/10.4230/LIPIcs.CPM.2021.22

Abstract

The hierarchical overlap graph (HOG) is a graph that encodes overlaps from a given set P of n strings, as the overlap graph does. A best known algorithm constructs HOG in O(||P|| log n) time and O(||P||) space, where ||P|| is the sum of lengths of the strings in P. In this paper we present a new algorithm to construct HOG in O(||P||) time and space. Hence, the construction time and space of HOG are better than those of the overlap graph, which are O(||P|| + n²).

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
Keywords
  • overlap graph
  • hierarchical overlap graph
  • shortest superstring problem
  • border array

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. V. Aho and M. J. Corasick. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 18(6):333-340, 1975. URL: https://doi.org/10.1145/360825.360855.
  2. A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis. Linear approximation of shortest superstrings. Journal of the ACM, 41(4):630-647, 1994. URL: https://doi.org/10.1145/179812.179818.
  3. B. Cazaux, R. Cánovas, and E. Rivals. Shortest DNA cyclic cover in compressed space. In DCC, pages 536-545, 2016. URL: https://doi.org/10.1109/DCC.2016.79.
  4. B. Cazaux and E. Rivals. A linear time algorithm for shortest cyclic cover of strings. Journal of Discrete Algorithms, 37:56-67, 2016. URL: https://doi.org/10.1016/j.jda.2016.05.001.
  5. B. Cazaux and E. Rivals. Hierarchical overlap graph. Information Processing Letters, 155:105862, 2020. URL: https://doi.org/10.1016/j.ipl.2019.105862.
  6. J. Gallant, D. Maier, and J. Astorer. On finding minimal length superstrings. Journal of Computer and System Sciences, 20(1):50-58, 1980. URL: https://doi.org/10.1016/0022-0000(80)90004-5.
  7. G. Gonnella and S. Kurtz. Readjoiner: A fast and memory efficient string graph-based sequence assembler. BMC Bioinformatics, 13(1):82, 2012. URL: https://doi.org/10.1186/1471-2105-13-82.
  8. D. Gusfield, G. M. Landau, and B. Schieber. An efficient algorithm for the all pairs suffix-prefix problem. Information Processing Letters, 41(4):181-185, 1992. URL: https://doi.org/10.1016/0020-0190(92)90176-V.
  9. D. E. Knuth, J. H. Morris, Jr., and V. R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(2):323-350, 1977. URL: https://doi.org/10.1137/0206024.
  10. J. Lim and K. Park. A fast algorithm for the all-pairs suffix-prefix problem. Theoretical Computer Science, 698:14-24, 2017. URL: https://doi.org/10.1016/j.tcs.2017.07.013.
  11. M. Mucha. Lyndon words and short superstrings. In SODA, pages 958-972. SIAM, 2013. URL: https://doi.org/10.1137/1.9781611973105.69.
  12. E. W. Myers. The fragment assembly string graph. Bioinformatics, 21 Suppl 2:ii79-ii85, 2005. URL: https://doi.org/10.1093/bioinformatics/bti1114.
  13. K. Paluch. Better approximation algorithms for maximum asymmetric traveling salesman and shortest superstring, 2014. URL: http://arxiv.org/abs/1401.3670.
  14. S. G. Park, B. Cazaux, K. Park, and E. Rivals. Efficient construction of hierarchical overlap graphs. In SPIRE, pages 277-290, 2020. URL: https://doi.org/10.1007/978-3-030-59212-7_20.
  15. H. Peltola. Algorithms for some string matching problems arising in molecular genetics. In IFIP Congress, pages 53-64, 1983. Google Scholar
  16. P. A. Pevzner, H. Tang, and M. S. Waterman. An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences, 98(17):9748-9753, 2001. URL: https://doi.org/10.1073/pnas.171285098.
  17. M. H. Rachid and Q. Malluhi. A practical and scalable tool to find overlaps between sequences. BioMed Research International, 2015, 2015. URL: https://doi.org/10.1155/2015/905261.
  18. Z. Sweedyk. A 21/2-approximation algorithm for shortest superstring. SIAM Journal on Computing, 29(3):954-986, 2000. URL: https://doi.org/10.1137/S0097539796324661.
  19. J. Tarhio and E. Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theoretical Computer Science, 57(1):131-145, 1988. URL: https://doi.org/10.1016/0304-3975(88)90167-3.
  20. E. Ukkonen. A linear-time algorithm for finding approximate shortest common superstrings. Algorithmica, 5(1):313-323, 1990. URL: https://doi.org/10.1007/BF01840391.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail