Bounded-Length Smith-Waterman Alignment

Tiskin, Alexander

doi:10.4230/LIPIcs.WABI.2019.16

File

Subject Classification

ACM Subject Classification

Theory of computation → Pattern matching
Theory of computation → Divide and conquer
Theory of computation → Dynamic programming
Applied computing → Molecular sequence analysis
Applied computing → Bioinformatics

Keywords

sequence alignment
local alignment
Smith
Waterman alignment
matrix searching

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

Given a fixed alignment scoring scheme, the bounded length (respectively, bounded total length) Smith-Waterman alignment problem on a pair of strings of lengths m, n, asks for the maximum alignment score across all substring pairs, such that the first substring’s length (respectively, the sum of the two substrings' lengths) is above the given threshold w. The latter problem was introduced by Arslan and Eğecioğlu under the name "local alignment with length threshold". They proposed a dynamic programming algorithm solving the problem in time O(mn^2), and also an approximation algorithm running in time O(rmn), where r is a parameter controlling the accuracy of approximation. We show that both these problems can be solved exactly in time O(mn), assuming a rational scoring scheme; furthermore, this solution can be used to obtain an exact algorithm for the normalised bounded total length Smith - Waterman alignment problem, running in time O(mn log n). Our algorithms rely on the techniques of fast window-substring alignment and implicit unit-Monge matrix searching, developed previously by the author and others.

Cite As Get BibTex

Alexander Tiskin. Bounded-Length Smith-Waterman Alignment. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 16:1-16:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019) https://doi.org/10.4230/LIPIcs.WABI.2019.16

Author Details

Alexander Tiskin

Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom

References

Abdullah N. Arslan and Ömer Eğecioğlu. Dynamic Programming Based Approximation Algorithms for Sequence Alignment with Constraints. INFORMS Journal on Computing, 16(4):441-458, 2004.
Abdullah N. Arslan and Ömer Eğecioğlu. Dynamic and fractional programming-based approximation algorithms for sequence alignment with constraints. In Handbook of Approximation Algorithms and Metaheuristics, Chapman and Hall/CRC Computer and Information Science Series, chapter 76, pages 76-1-76-16. Chapman and Hall/CRC, 2007.
Abdullah N. Arslan, Ömer Eǧecioǧlu, and Pavel A. Pevzner. A new approach to sequence comparison: Normalized sequence alignment. Bioinformatics, 17(4):327-337, 2001.
Wolfgang W. Bein and Pramod K. Pathak. A characterization of the Monge property and its connection to statistics. Demonstratio Mathematica, 29(2):451-457, 1996.
Philip Bille and Martin Farach-Colton. Fast and compact regular expression matching. Theoretical Computer Science, 409(3):486-496, 2008.
V. Y. Burdyuk and V. N. Trofimov. Generalization of the results of Gilmore and Gomory on the solution of the traveling salesman problem. Engineering Cybernetics, 14:12-18, 1976.
Rainer E. Burkard. Monge properties, discrete convexity and applications. European Journal of Operational Research, 176(1):1-14, 2007.
Rainer E. Burkard, Bettina Klinz, and Rüdiger Rudolf. Perspectives of Monge properties in optimization. Discrete Applied Mathematics, 70(2):95-161, 1996.
Maxime Crochemore, Gad M. Landau, and Michal Ziv-Ukelson. A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices. SIAM Journal on Computing, 32(6):1654-1673, 2003.
Paweł Gawrychowski. Faster algorithm for computing the edit distance between SLP-compressed strings. In Lecture Notes in Computer Science, volume 7608 of Lecture Notes in Computer Science, pages 229-236, 2012.
Osamu Gotoh. An improved algorithm for matching biological sequences. Journal of Molecular Biology, 162(3):705-708, 1982.
D. Gusfield, K. Balasubramanian, and D. Naor. Parametric optimization of sequence alignment. Algorithmica, 12(4-5):312-326, 1994.
Dan Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge, 1997.
Benjamin Jackson and Srinivas Aluru. Pairwise Sequence Alignment. In Handbook of Computational Molecular Biology, Chapman and Hall/CRC Computer and Information Science Series, chapter 1, pages 1-1-1-32. Chapman and Hall/CRC, 2010.
N. C. Jones and P. A. Pevzner. An Introduction to Bioinformatics Algorithms. Computational Molecular Biology. The MIT Press, 2004.
William J. Masek and Michael S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20(1):18-31, 1980.
Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443-453, 1970.
S. V. Rice, H. Bunke, and T. A. Nartker. Classes of Cost Functions for String Edit Distance. Algorithmica, 18(2):271-280, 1997.
Peter H. Sellers. The theory and computation of evolutionary distances: Pattern recognition. Journal of Algorithms, 1(4):359-373, 1980.
T.F. Smith and M.S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147(1):195-197, 1981.
A. Tiskin. Faster subsequence recognition in compressed strings. Journal of Mathematical Sciences, 158(5):759-769, 2009.
Alexander Tiskin. Semi-local longest common subsequences in subquadratic time. Journal of Discrete Algorithms, 6(4):570-581, 2008.
Alexander Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Mathematics in Computer Science, 1(4):571-603, 2008.
Alexander Tiskin. Fast Distance Multiplication of Unit-Monge Matrices. Algorithmica, 71(4):859-888, 2015.
Robert A. Wagner and Michael J. Fischer. The String-to-String Correction Problem. Journal of the ACM, 21(1):168-173, 1974.

Bounded-Length Smith-Waterman Alignment

Author Alexander Tiskin

File

Document Identifiers

Subject Classification

ACM Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message