Bounded-Length Smith-Waterman Alignment

Author Alexander Tiskin



PDF
Thumbnail PDF

File

LIPIcs.WABI.2019.16.pdf
  • Filesize: 440 kB
  • 12 pages

Document Identifiers

Author Details

Alexander Tiskin
  • Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom

Cite AsGet BibTex

Alexander Tiskin. Bounded-Length Smith-Waterman Alignment. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 16:1-16:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)
https://doi.org/10.4230/LIPIcs.WABI.2019.16

Abstract

Given a fixed alignment scoring scheme, the bounded length (respectively, bounded total length) Smith-Waterman alignment problem on a pair of strings of lengths m, n, asks for the maximum alignment score across all substring pairs, such that the first substring’s length (respectively, the sum of the two substrings' lengths) is above the given threshold w. The latter problem was introduced by Arslan and Eğecioğlu under the name "local alignment with length threshold". They proposed a dynamic programming algorithm solving the problem in time O(mn^2), and also an approximation algorithm running in time O(rmn), where r is a parameter controlling the accuracy of approximation. We show that both these problems can be solved exactly in time O(mn), assuming a rational scoring scheme; furthermore, this solution can be used to obtain an exact algorithm for the normalised bounded total length Smith - Waterman alignment problem, running in time O(mn log n). Our algorithms rely on the techniques of fast window-substring alignment and implicit unit-Monge matrix searching, developed previously by the author and others.

Subject Classification

ACM Subject Classification
  • Theory of computation → Pattern matching
  • Theory of computation → Divide and conquer
  • Theory of computation → Dynamic programming
  • Applied computing → Molecular sequence analysis
  • Applied computing → Bioinformatics
Keywords
  • sequence alignment
  • local alignment
  • Smith
  • Waterman alignment
  • matrix searching

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Abdullah N. Arslan and Ömer Eğecioğlu. Dynamic Programming Based Approximation Algorithms for Sequence Alignment with Constraints. INFORMS Journal on Computing, 16(4):441-458, 2004. Google Scholar
  2. Abdullah N. Arslan and Ömer Eğecioğlu. Dynamic and fractional programming-based approximation algorithms for sequence alignment with constraints. In Handbook of Approximation Algorithms and Metaheuristics, Chapman and Hall/CRC Computer and Information Science Series, chapter 76, pages 76-1-76-16. Chapman and Hall/CRC, 2007. Google Scholar
  3. Abdullah N. Arslan, Ömer Eǧecioǧlu, and Pavel A. Pevzner. A new approach to sequence comparison: Normalized sequence alignment. Bioinformatics, 17(4):327-337, 2001. Google Scholar
  4. Wolfgang W. Bein and Pramod K. Pathak. A characterization of the Monge property and its connection to statistics. Demonstratio Mathematica, 29(2):451-457, 1996. Google Scholar
  5. Philip Bille and Martin Farach-Colton. Fast and compact regular expression matching. Theoretical Computer Science, 409(3):486-496, 2008. Google Scholar
  6. V. Y. Burdyuk and V. N. Trofimov. Generalization of the results of Gilmore and Gomory on the solution of the traveling salesman problem. Engineering Cybernetics, 14:12-18, 1976. Google Scholar
  7. Rainer E. Burkard. Monge properties, discrete convexity and applications. European Journal of Operational Research, 176(1):1-14, 2007. Google Scholar
  8. Rainer E. Burkard, Bettina Klinz, and Rüdiger Rudolf. Perspectives of Monge properties in optimization. Discrete Applied Mathematics, 70(2):95-161, 1996. Google Scholar
  9. Maxime Crochemore, Gad M. Landau, and Michal Ziv-Ukelson. A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices. SIAM Journal on Computing, 32(6):1654-1673, 2003. Google Scholar
  10. Paweł Gawrychowski. Faster algorithm for computing the edit distance between SLP-compressed strings. In Lecture Notes in Computer Science, volume 7608 of Lecture Notes in Computer Science, pages 229-236, 2012. Google Scholar
  11. Osamu Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3):705-708, 1982. Google Scholar
  12. D. Gusfield, K. Balasubramanian, and D. Naor. Parametric optimization of sequence alignment. Algorithmica, 12(4-5):312-326, 1994. Google Scholar
  13. Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge, 1997. Google Scholar
  14. Benjamin Jackson and Srinivas Aluru. Pairwise Sequence Alignment. In Handbook of Computational Molecular Biology, Chapman and Hall/CRC Computer and Information Science Series, chapter 1, pages 1-1-1-32. Chapman and Hall/CRC, 2010. Google Scholar
  15. N. C. Jones and P. A. Pevzner. An Introduction to Bioinformatics Algorithms. Computational Molecular Biology. The MIT Press, 2004. Google Scholar
  16. William J. Masek and Michael S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20(1):18-31, 1980. Google Scholar
  17. Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443-453, 1970. Google Scholar
  18. S. V. Rice, H. Bunke, and T. A. Nartker. Classes of Cost Functions for String Edit Distance. Algorithmica, 18(2):271-280, 1997. Google Scholar
  19. Peter H. Sellers. The theory and computation of evolutionary distances: Pattern recognition. Journal of Algorithms, 1(4):359-373, 1980. Google Scholar
  20. T.F. Smith and M.S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195-197, 1981. Google Scholar
  21. A. Tiskin. Faster subsequence recognition in compressed strings. Journal of Mathematical Sciences, 158(5):759-769, 2009. Google Scholar
  22. Alexander Tiskin. Semi-local longest common subsequences in subquadratic time. Journal of Discrete Algorithms, 6(4):570-581, 2008. Google Scholar
  23. Alexander Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Mathematics in Computer Science, 1(4):571-603, 2008. Google Scholar
  24. Alexander Tiskin. Fast Distance Multiplication of Unit-Monge Matrices. Algorithmica, 71(4):859-888, 2015. Google Scholar
  25. Robert A. Wagner and Michael J. Fischer. The String-to-String Correction Problem. Journal of the ACM, 21(1):168-173, 1974. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail