Algorithms for Normalized Multiple Sequence Alignments

Authors Eloi Araujo , Luiz C. Rozante , Diego P. Rubert , Fábio V. Martinez



PDF
Thumbnail PDF

File

LIPIcs.ISAAC.2021.40.pdf
  • Filesize: 0.95 MB
  • 16 pages

Document Identifiers

Author Details

Eloi Araujo
  • Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, MS, Brasil
Luiz C. Rozante
  • Centro de Matemática Computação e Cognição, Universidade Federal do ABC, Santo André, MS, Brasil
Diego P. Rubert
  • Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, MS, Brasil
Fábio V. Martinez
  • Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, MS, Brasil

Acknowledgements

We thank the three anonymous reviewers for their valuable comments and suggestions on our manuscript.

Cite AsGet BibTex

Eloi Araujo, Luiz C. Rozante, Diego P. Rubert, and Fábio V. Martinez. Algorithms for Normalized Multiple Sequence Alignments. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 212, pp. 40:1-40:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.ISAAC.2021.40

Abstract

Sequence alignment supports numerous tasks in bioinformatics, natural language processing, pattern recognition, social sciences, and other fields. While the alignment of two sequences may be performed swiftly in many applications, the simultaneous alignment of multiple sequences proved to be naturally more intricate. Although most multiple sequence alignment (MSA) formulations are NP-hard, several approaches have been developed, as they can outperform pairwise alignment methods or are necessary for some applications. Taking into account not only similarities but also the lengths of the compared sequences (i.e. normalization) can provide better alignment results than both unnormalized or post-normalized approaches. While some normalized methods have been developed for pairwise sequence alignment, none have been proposed for MSA. This work is a first effort towards the development of normalized methods for MSA. We discuss multiple aspects of normalized multiple sequence alignment (NMSA). We define three new criteria for computing normalized scores when aligning multiple sequences, showing the NP-hardness and exact algorithms for solving the NMSA using those criteria. In addition, we provide approximation algorithms for MSA and NMSA for some classes of scoring matrices.

Subject Classification

ACM Subject Classification
  • Theory of computation → Problems, reductions and completeness
  • Theory of computation → Approximation algorithms analysis
  • Applied computing → Molecular sequence analysis
Keywords
  • Multiple sequence alignment
  • Normalized multiple sequence alignment
  • Algorithms and complexity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. A. Abbott and A. Tsay. Sequence analysis and optimal matching methods in sociology: Review and prospect. Sociol Method Res, 29(1):3-33, 2000. URL: https://doi.org/10.1177/0049124100029001001.
  2. A. Andoni, R. Krauthgamer, and K. Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proc. of FOCS, pages 377-386. IEEE, 2010. URL: https://doi.org/10.1109/FOCS.2010.43.
  3. A. Apostolico and Z. Galil. Pattern Matching Algorithms. Oxford University Press, 1997. Google Scholar
  4. E. Araujo, L. C. Rozante, D. P. Rubert, and F. V. Martinez. Algorithms for normalized multiple sequence alignments, 2021. URL: http://arxiv.org/abs/2107.01607.
  5. E. Araujo and J. Soares. Scoring matrices that induce metrics on sequences. In Proc. of LATIN, pages 68-79, 2006. URL: https://doi.org/10.1007/11682462_11.
  6. A. N. Arslan and Ö. Egecioglu. An efficient uniform-cost normalized edit distance algorithm. In Proc. of SPIRE, pages 9-15. IEEE, 1999. URL: https://doi.org/10.1109/SPIRE.1999.796572.
  7. A. Backurs and P. Indyk. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J Comput, 47(3):1087-1097, 2018. URL: https://doi.org/10.1137/15M1053128.
  8. R. Barzilay and L. Lee. Bootstrapping lexical choice via multiple-sequence alignment. In Proc. of EMNLP, pages 164-171. ACL, 2002. URL: https://doi.org/10.3115/1118693.1118715.
  9. H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J Appl Math, 48(5):1073-1082, 1988. URL: https://doi.org/10.1137/0148063.
  10. D. Chakraborty, D. Das, E. Goldenberg, M. Kouckỳ, and M. Saks. Approximating edit distance within constant factor in truly sub-quadratic time. J ACM, 67(6):1-22, 2020. URL: https://doi.org/10.1145/3422823.
  11. M. Crochemore, G. M. Landau, and M. Ziv-Ukelson. A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In Proc. of SODA, pages 679-688. SIAM, 2002. URL: https://doi.org/10.5555/545381.545472.
  12. J. A. Cuff and G. J. Barton. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins, 34(4):508-519, 1999. URL: https://doi.org/10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4.
  13. I. Elias. Settling the intractability of multiple alignment. J Comput Biol, 13(7):1323-1339, 2006. URL: https://doi.org/10.1089/cmb.2006.13.1323.
  14. D.-F. Feng and R. F. Doolittle. Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. J Mol Evol, 25(4):351-360, 1987. URL: https://doi.org/10.1007/BF02603120.
  15. D. Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bull Math Biol, 55(1):141-154, 1993. URL: https://doi.org/10.1007/BF02460299.
  16. W. Haque, A. Aravind, and B. Reddy. Pairwise sequence alignment algorithms: A survey. In Proc. of ISTA, pages 96-103. ACM, 2009. URL: https://doi.org/10.1145/1551950.1551980.
  17. M. Hirosawa, Y. Totoki, M. Hoshida, and M. Ishikawa. Comprehensive study on iterative algorithms of multiple sequence alignment. Bioinformatics, 11(1):13-18, 1995. URL: https://doi.org/10.1093/bioinformatics/11.1.13.
  18. A. Marzal and E. Vidal. Computation of normalized edit distance and applications. IEEE T Pattern Anal, 15(9):926-932, 1993. URL: https://doi.org/10.1109/34.232078.
  19. W. J. Masek and M. S. Paterson. A faster algorithm computing string edit distances. J Comput Syst Sci, 20(1):18-31, 1980. URL: https://doi.org/10.1016/0022-0000(80)90002-1.
  20. S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol, 48(3):443-453, 1970. URL: https://doi.org/10.1016/0022-2836(70)90057-4.
  21. T. H. Ogden and M. S. Rosenberg. Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol, 55(2):314-328, 2006. URL: https://doi.org/10.1080/10635150500541730.
  22. P. H. Sellers. On the theory and computation of evolutionary distances. SIAM J Appl Math, 26(4):787-793, 1974. URL: https://doi.org/10.1137/0126070.
  23. F. Sievers and D. G. Higgins. Clustal Omega. Curr Protoc Bioinfo, 48(1):3.13.1-3.13.16, 2014. URL: https://doi.org/10.1002/0471250953.bi0313s48.
  24. F. Sievers, A. Wilm, D. Dineen, T. J. Gibson, K. Karplus, W. Li, R. Lopez, H. McWilliam, M. Remmert, J. Söding, J. D. Thompson, and D. G. Higgins. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol, 7(1):539, 2011. URL: https://doi.org/10.1038/msb.2011.75.
  25. J. D. Thompson, F. Plewniak, and O. Poch. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res, 27(13):2682-2690, 1999. URL: https://doi.org/10.1093/nar/27.13.2682.
  26. E. Vidal, A. Marzal, and P. Aibar. Fast computation of normalized edit distances. IEEE T Pattern Anal, 17(9):899-902, 1995. URL: https://doi.org/10.1109/34.406656.
  27. I. M. Wallace, O. O'Sullivan, D. G. Higgins, and C. Notredame. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res, 34(6):1692-1699, 2006. URL: https://doi.org/10.1093/nar/gkl091.
  28. X.-D. Wang, J.-X. Liu, Y. Xu, and J. Zhang. A survey of multiple sequence alignment techniques. In Proc. of ICIC, pages 529-538. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-22180-9_52.