A Succinct Solution to Rmap Alignment

Authors Martin D. Muggli , Simon J. Puglisi , Christina Boucher



PDF
Thumbnail PDF

File

LIPIcs.WABI.2018.12.pdf
  • Filesize: 0.59 MB
  • 16 pages

Document Identifiers

Author Details

Martin D. Muggli
  • Department of Computer Science, Colorado State University
Simon J. Puglisi
  • Department of Computer Science, University of Helsinki, Finland
Christina Boucher
  • Department of Computer and Information Science and Engineering, University of Florida

Cite AsGet BibTex

Martin D. Muggli, Simon J. Puglisi, and Christina Boucher. A Succinct Solution to Rmap Alignment. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 12:1-12:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.WABI.2018.12

Abstract

We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate Kohdista on simulated E. coli data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions. Lastly, we demonstrate Kohdista is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time. Kohdista is available at https://github.com/mmuggli/KOHDISTA/.

Subject Classification

ACM Subject Classification
  • Applied computing → Bioinformatics
Keywords
  • Optical mapping
  • index based data structures
  • FM-index
  • graph algorithms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. M. Burrows and D.J. Wheeler. A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California, 1994. Google Scholar
  2. S. Chamala et al. Assembly and validation of the genome of the nonmodel basal angiosperm amborella. Science, 342(6165):1516-1517, 2013. Google Scholar
  3. Aston Christopher and Schwartz David C. Optical mapping in genomic analysis, 2006. URL: http://dx.doi.org/10.1002/9780470027318.a1421.
  4. E.T. Dimalanta, A. Lim, R. Runnheim, C. Lamers, C. Churas, D.K. Forrest, J.J. de Pablo, M.D. Graham, S.N. Coppersmith, S. Goldstein, and D.C. Schwartz. A microfluidic system for large DNA molecule arrays. Analytical Chemistry, 76(18):5293-5301, 2004. Google Scholar
  5. Y. Dong et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (capra hircus). Nature Biotechnology, 31(2):136-141, 2013. Google Scholar
  6. P. Ferragina and G. Manzini. Indexing compressed text. J. ACM, 52(4):552-581, 2005. Google Scholar
  7. T. Gagie, G. Navarro, and S. J. Puglisi. New algorithms on wavelet trees and applications to information retrieval. Theoretical Computer Science, 426-427:25-41, 2012. Google Scholar
  8. Simon Gog et al. From theory to practice: Plug and play with succinct data structures. In 13th International Symposium on Experimental Algorithms, (SEA 2014), pages 326-337, 2014. URL: http://dx.doi.org/10.1007/978-3-319-07959-2_28.
  9. B. Langmead et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 10(3):R25, 2009. Google Scholar
  10. Alden King-Yung Leung et al. Omblast: alignment tool for optical mapping using a seed-and-extend approach. Bioinformatics, page btw620, 2016. Google Scholar
  11. H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14):1754-60, 2009. Google Scholar
  12. U. Manber and G. W. Myers. Suffix arrays: A new method for on-line string searches. SIAM Journal on Scientific Computing, 22(5):935-948, 1993. Google Scholar
  13. L. M. Mendelowitz et al. Maligner: a fast ordered restriction map aligner. Bioinformatics, 32(7):1016-1022, 2016. Google Scholar
  14. M.D. Muggli, S.J. Puglisi, and C. Boucher. Efficient indexed alignment of contigs to optical maps. In Proc. of WABI, pages 68-81, 2014. Google Scholar
  15. N. Nagarajan, T. D Read, and M. Pop. Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics, 24(10):1229endash1235, 2008. Google Scholar
  16. S. Reslewic et al. Whole-genome shotgun optical mapping of Rhodospirillum Rubrum. Applied Environmental Microbiology, 71(9):5511-5522, 2005. Google Scholar
  17. J. Sirén et al. Indexing graphs for path queries with applications in genome research. IEEE/ACM TCBB, 11(2):375-388, 2014. Google Scholar
  18. A. Valouev et al. An algorithm for assembly of ordered restriction maps from single DNA molecules. Proc Natl Acad Sci, 103(43):15770-15775, 2006. Google Scholar
  19. A. Valouev et al. Alignment of optical maps. J. Comp Bio, 13(2):442-462, 2006. Google Scholar
  20. Davide Verzotto et al. Optima: Sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis. GigaScience, 5(1):2, 2016. Google Scholar
  21. S. Zhou et al. A whole-genome shotgun optical map of Yersinia pestis strain KIM. Applied and Environmental Microbiology, 68(12):6321-6331, 2002. Google Scholar
  22. S. Zhou et al. Shotgun optical mapping of the entire Leishmania majorriedlin genome. Molecular and Biochemical Parasitology, 138(1):97-106, 2004. Google Scholar