Natural Family-Free Genomic Distance

Authors Diego P. Rubert , Fábio V. Martinez , Marília D. V. Braga



PDF
Thumbnail PDF

File

LIPIcs.WABI.2020.3.pdf
  • Filesize: 0.91 MB
  • 23 pages

Document Identifiers

Author Details

Diego P. Rubert
  • Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, Brazil
Fábio V. Martinez
  • Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Campo Grande, Brazil
Marília D. V. Braga
  • Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany

Acknowledgements

We thank the anonymous reviewers for their valuable comments.

Cite AsGet BibTex

Diego P. Rubert, Fábio V. Martinez, and Marília D. V. Braga. Natural Family-Free Genomic Distance. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 3:1-3:23, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.WABI.2020.3

Abstract

A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. While the most traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families, more recently an alternative model was proposed, which, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. This model represents structural rearrangements by the generic double cut and join (DCJ) operation and is then called family-free DCJ distance. It computes the DCJ distance between the two genomes by searching for a matching of their genes based on the given pairwise similarities, therefore helping to find gene homologies. The drawback is that its computation is NP-hard. Another point is that the family-free DCJ distance must correspond to a maximal matching of the genes, due to the fact that unmatched genes are just ignored: maximizing the matching prevents the free lunch artifact of having empty or almost empty matchings giving the smaller distances. In this paper, besides DCJ operations, we allow content-modifying operations of insertions and deletions of DNA segments and propose a new and more general family-free genomic distance. In our model we use the pairwise similarities to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our model then results in a natural family-free genomic distance, that takes into consideration all given genes and has a search space composed of matchings of any size. We provide an efficient ILP formulation to solve it, by extending the previous formulations for computing family-based genomic distances from Shao et al. (J. Comput. Biol., 2015) and Bohnenkämper et al. (Proc. of RECOMB, 2020). Our experiments show that the ILP can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.

Subject Classification

ACM Subject Classification
  • Theory of computation → Linear programming
Keywords
  • Comparative genomics
  • Genome rearrangement
  • DCJ-indel distance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Mark D. Adams, Susan E. Celniker, Robert A. Holt, et al. The genome sequence of Drosophila melanogaster. Science, 287:2185-2195, 2000. URL: https://doi.org/10.1126/science.287.5461.2185.
  2. Sébastien Angibaud, Guillaume Fertin, Irena Rusu, Annelyse Thévenin, and Stéphane Vialette. On the approximability of comparing genomes with duplicates. Journal of Graph Algorithms and Applications, 13(1):19-53, 2009. URL: https://doi.org/10.7155/jgaa.00175.
  3. Anne Bergeron, Julia Mixtacki, and Jens Stoye. A unifying view of genome rearrangements. In Proc. of WABI, volume 4175 of Lecture Notes in Bioinformatics, pages 163-173, 2006. URL: https://doi.org/10.1007/11851561_16.
  4. Leonard Bohnenkämper, Marília D. V. Braga, Daniel Doerr, and Jens Stoye. Computing the rearrangement distance of natural genomes. In Proc. of RECOMB, volume 12074 of Lecture Notes in Bioinformatics, pages 3-18, 2020. URL: https://doi.org/10.1007/978-3-030-45257-5_1.
  5. Marília D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Thévenin, and Roland Wittler. The potential of family-free genome comparison. In C. Chauve, N. El-Mabrouk, and E. Tannier, editors, Models and Algorithms for Genome Evolution, chapter 13, pages 287-307. Springer, 2013. URL: https://doi.org/10.1007/978-1-4471-5298-9_13.
  6. Marília D. V. Braga, Eyla Willing, and Jens Stoye. Double cut and join with insertions and deletions. Journal of Computational Biology, 18(9):1167-1184, 2011. URL: https://doi.org/10.1089/cmb.2011.0118.
  7. David Bryant. The complexity of calculating exemplar distances. In David Sankoff and Joseph H. Nadeau, editors, Comparative Genomics, pages 207-211. Springer, 2000. URL: https://doi.org/10.1007/978-94-011-4309-7_19.
  8. Laurent Bulteau and Minghui Jiang. Inapproximability of (1,2)-exemplar distance. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(6):1384-1390, 2013. URL: https://doi.org/10.1109/TCBB.2012.144.
  9. Andrew G. Clark, Michael B. Eisen, Douglas R. Smith, et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature, 450:203-218, 2007. URL: https://doi.org/10.1038/nature06341.
  10. Daniel A. Dalquen, Maria Anisimova, Gaston H. Gonnet, and Christophe Dessimoz. ALF - a simulation framework for genome evolution. Mol Biol Evol, 29(4):1115, 2012. URL: https://doi.org/10.1093/molbev/msr268.
  11. Daniel Doerr, Pedro Feijão, and Jens Stoye. Family-free genome comparison. In João C. Setubal, Jens Stoye, and Peter F. Stadler, editors, Comparative Genomics: Methods and Protocols, pages 331-342. Springer, 2018. URL: https://doi.org/10.1007/978-1-4939-7463-4_12.
  12. Daniel Doerr, Annelyse Thévenin, and Jens Stoye. Gene family assignment-free comparative genomics. BMC Bioinformatics, 13(Suppl 19):S3, 2012. URL: https://doi.org/10.1186/1471-2105-13-S19-S3.
  13. Sridhar Hannenhalli and Pavel A. Pevzner. Transforming men into mice (polynomial algorithm for genomic distance problem). In Proc. of FOCS, pages 581-592, 1995. URL: https://doi.org/10.1109/SFCS.1995.492588.
  14. Sudhir Kumar, Glen Stecher, Michael Li, Christina Knyaz, and Koichiro Tamura. MEGA X: molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution, 35(6):1547-1549, 2018. URL: https://doi.org/10.1093/molbev/msy096.
  15. Sudhir Kumar, Glen Stecher, Michael Suleski, and S. Blair Hedges. Timetree: a resource for timelines, timetrees, and divergence times. Molecular Biology and Evolution, 34(7):1812-1819, 2017. URL: https://doi.org/10.1093/molbev/msx116.
  16. Fábio V. Martinez, Pedro Feijao, Marília D. V. Braga, and Jens Stoye. On the family-free DCJ distance and similarity. Algorithms for Molecular Biology, 13(10), 2015. URL: https://doi.org/10.1186/s13015-015-0041-9.
  17. Stephen Richards, Yue Liu, Brian R. Bettencourt et al. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution. Genome Research, 15:1-18, 2005. URL: https://doi.org/10.1101/gr.3059305.
  18. Diego P. Rubert, Pedro Feijão, Marília D. V. Braga, Jens Stoye, and Fábio V. Martinez. Approximating the DCJ distance of balanced genomes in linear time. Algorithms for Molecular Biology, 12(3), 2017. URL: https://doi.org/10.1186/s13015-017-0095-y.
  19. Diego P. Rubert, Fábio V. Martinez, and Marília D. V. Braga. Natural family-free genomic distance. arXiv:2007.03556, 2020. Google Scholar
  20. Naruya Saitou and Masatoshi Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4):406-425, 1987. URL: https://doi.org/10.1093/oxfordjournals.molbev.a040454.
  21. David Sankoff. Edit distance for genome comparison based on non-local operations. In Proc. of CPM, volume 644 of Lecture Notes in Computer Science, pages 121-135, 1992. URL: https://doi.org/10.1007/3-540-56024-6_10.
  22. David Sankoff. Genome rearrangement with gene families. Bioinformatics, 15(11):909-917, 1999. URL: https://doi.org/10.1093/bioinformatics/15.11.909.
  23. Mingfu Shao, Yu Lin, and Bernard Moret. An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. Journal of Computational Biology, 22(5):425-435, 2015. URL: https://doi.org/10.1089/cmb.2014.0096.
  24. Sophia Yancopoulos, Oliver Attie, and Richard Friedberg. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics, 21(16):3340-3346, 2005. URL: https://doi.org/10.1093/bioinformatics/bti535.
  25. Sophia Yancopoulos and Richard Friedberg. DCJ path formulation for genome transformations which include insertions, deletions, and duplications. Journal of Computational Biology, 16(10):1311-1338, 2009. URL: https://doi.org/10.1089/cmb.2009.0092.
  26. Qi Zhou and Doris Bachtrog. Ancestral chromatin configuration constrains chromatin evolution on differentiating sex chromosomes in Drosophila. PLoS Genetics, 11(6), 2015. URL: https://doi.org/10.1371/journal.pgen.1005331.