Chaining with Overlaps Revisited

Authors Veli Mäkinen , Kristoffer Sahlin



PDF
Thumbnail PDF

File

LIPIcs.CPM.2020.25.pdf
  • Filesize: 491 kB
  • 12 pages

Document Identifiers

Author Details

Veli Mäkinen
  • Department of Computer Science, University of Helsinki, Finland
Kristoffer Sahlin
  • Department of Mathematics, Science for Life Laboratory, Stockholm University, Sweden

Acknowledgements

We wish to thank Manuel Cáceres for spotting a mistake in our original coverage definition regarding nested anchors and the anonymous reviewers for useful suggestions to improve the readability.

Cite AsGet BibTex

Veli Mäkinen and Kristoffer Sahlin. Chaining with Overlaps Revisited. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 161, pp. 25:1-25:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.CPM.2020.25

Abstract

Chaining algorithms aim to form a semi-global alignment of two sequences based on a set of anchoring local alignments as input. Depending on the optimization criteria and the exact definition of a chain, there are several O(n log n) time algorithms to solve this problem optimally, where n is the number of input anchors. In this paper, we focus on a formulation allowing the anchors to overlap in a chain. This formulation was studied by Shibuya and Kurochkin (WABI 2003), but their algorithm comes with no proof of correctness. We revisit and modify their algorithm to consider a strict definition of precedence relation on anchors, adding the required derivation to convince on the correctness of the resulting algorithm that runs in O(n log² n) time on anchors formed by exact matches. With the more relaxed definition of precedence relation considered by Shibuya and Kurochkin or when anchors are non-nested such as matches of uniform length (k-mers), the algorithm takes O(n log n) time. We also establish a connection between chaining with overlaps and the widely studied longest common subsequence problem.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Dynamic programming
  • Applied computing → Genomics
Keywords
  • Sparse Dynamic Programming
  • Chaining
  • Maximal Exact Matches
  • Longest Common Subsequence

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mohamed Ibrahim Abouelhoda and Enno Ohlebusch. Multiple genome alignment: Chaining algorithms revisited. In Ricardo A. Baeza-Yates, Edgar Chávez, and Maxime Crochemore, editors, Combinatorial Pattern Matching, 14th Annual Symposium, CPM 2003, Morelia, Michocán, Mexico, June 25-27, 2003, Proceedings, volume 2676 of Lecture Notes in Computer Science, pages 1-16. Springer, 2003. URL: https://doi.org/10.1007/3-540-44888-8_1.
  2. Mohamed Ibrahim Abouelhoda and Enno Ohlebusch. Chaining algorithms for multiple genome comparison. J. Discrete Algorithms, 3(2-4):321-341, 2005. URL: https://doi.org/10.1016/j.jda.2004.08.011.
  3. Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Rocco A. Servedio and Ronitt Rubinfeld, editors, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 51-58. ACM, 2015. URL: https://doi.org/10.1145/2746539.2746612.
  4. Mark de Berg, Otfried Cheong, Marc J. van Kreveld, and Mark H. Overmars. Computational geometry: algorithms and applications, 3rd Edition. Springer, 2008. URL: http://www.worldcat.org/oclc/227584184.
  5. David Eppstein, Zvi Galil, Raffaele Giancarlo, and Giuseppe F. Italiano. Sparse dynamic programming I: linear cost functions. J. ACM, 39(3):519-545, 1992. URL: https://doi.org/10.1145/146637.146650.
  6. Stefan Felsner, Rudolf Müller, and Lorenz Wernisch. Trapezoid graphs and generalizations, geometry and algorithms. Discrete Applied Mathematics, 74(1):13-32, 1997. URL: https://doi.org/10.1016/S0166-218X(96)00013-3.
  7. Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, 1997. URL: https://doi.org/10.1017/cbo9780511574931.
  8. Veli Mäkinen, Gonzalo Navarro, and Esko Ukkonen. Transposition invariant string matching. J. Algorithms, 56(2):124-153, 2005. Google Scholar
  9. Veli Mäkinen, Leena Salmela, and Johannes Ylinen. Normalized N50 assembly metric using gap-restricted co-linear chaining. BMC Bioinformatics, 13:255, 2012. URL: https://doi.org/10.1186/1471-2105-13-255.
  10. Veli Mäkinen, Alexandru I. Tomescu, Anna Kuosmanen, Topi Paavilainen, Travis Gagie, and Rayan Chikhi. Sparse dynamic programming on DAGs with small width. ACM Trans. Algorithms, 15(2):29:1-29:21, 2019. URL: https://doi.org/10.1145/3301312.
  11. Gene Myers and Webb Miller. Chaining multiple-alignment fragments in sub-quadratic time. In Kenneth L. Clarkson, editor, Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 22-24 January 1995. San Francisco, California, USA., pages 38-47. ACM/SIAM, 1995. URL: http://dl.acm.org/citation.cfm?id=313651.313661.
  12. Tetsuo Shibuya and Igor Kurochkin. Match Chaining Algorithms for cDNA Mapping. In Gary Benson and Roderic D. M. Page, editors, Algorithms in Bioinformatics, pages 462-475, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg. Google Scholar
  13. Raluca Uricaru, Alban Mancheron, and Eric Rivals. Novel definition and algorithm for chaining fragments with proportional overlaps. Journal of Computational Biology, 18(9):1141-1154, 2011. URL: https://doi.org/10.1089/cmb.2011.0126.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail