New Algorithms for Structure Informed Genome Rearrangement

Authors Eden Ozery, Meirav Zehavi, Michal Ziv-Ukelson



PDF
Thumbnail PDF

File

LIPIcs.WABI.2022.11.pdf
  • Filesize: 0.97 MB
  • 19 pages

Document Identifiers

Author Details

Eden Ozery
  • Ben Gurion University of the Negev, Israel
Meirav Zehavi
  • Ben Gurion University of the Negev, Israel
Michal Ziv-Ukelson
  • Ben Gurion University of the Negev, Israel

Acknowledgements

Our sincere thanks go to the anonymous WABI 2022 referees who, with their careful reading and incisive comments, helped improve this paper.

Cite As Get BibTex

Eden Ozery, Meirav Zehavi, and Michal Ziv-Ukelson. New Algorithms for Structure Informed Genome Rearrangement. In 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022). Leibniz International Proceedings in Informatics (LIPIcs), Volume 242, pp. 11:1-11:19, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022) https://doi.org/10.4230/LIPIcs.WABI.2022.11

Abstract

We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, Constrained TreeToString Divergence (CTTSD), we define the basic structure-informed rearrangement based divergence measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to the reference gene order. Then, a structure-informed gene rearrangement measure is computed between the ordered PQ-tree and the target gene order. The second problem, TreeToString Divergence (TTSD), generalizes CTTSD, where the gene order members are not necessarily permutations and the structure-informed rearrangement based divergence measure is extended to also consider up to d_S and d_T gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference order to the target order.
The first algorithm solves CTTSD in O(n γ² ⋅ (m_p ⋅ 1.381^γ + m_q)) time and O(n²) space, where γ is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and m_p and m_q are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of CTTSD is 0, then the algorithm runs in O(n m γ²) time and O(n²) space. The second algorithm solves TTSD in O(n² γ² {d_T}² {d_S}² m² (m_p ⋅ 5^γ γ + m_q)) time and O(d_T d_S m (m n + 5^γ)) space, where γ is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, m_p and m_q are the number of P-nodes and Q-nodes in the tree, respectively, and allowing d_T deletions from the tree and d_S deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of TTSD is 0) in O(n γ² {d_T}² {d_S}² m² (m_p ⋅ 4^γ γ²n(d_T+d_S+m+n) + m_q)) time and O(γ² n m² d_T d_S (d_T+d_S+m+n)) space.
The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1,487 prokaryotic genomes.

Subject Classification

ACM Subject Classification
  • Applied computing → Computational biology
Keywords
  • PQ-tree
  • Gene Cluster
  • Breakpoint Distance

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Eric Alm, Katherine Huang, and Adam Arkin. The evolution of two-component systems in bacteria reveals different strategies for niche adaptation. PLoS computational biology, 2(11):e143, 2006. Google Scholar
  2. Severine Bérard, Anne Bergeron, and Cedric Chauve. Conservation of combinatorial structures in evolution scenarios. In RECOMB Workshop on Comparative Genomics, pages 1-14. Springer, 2004. Google Scholar
  3. Severine Bérard, Anne Bergeron, Cedric Chauve, and Christophe Paul. Perfect sorting by reversals is not always difficult. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4:4-16, 2007. URL: https://doi.org/10.1145/1229968.1229972.
  4. Sèverine Bérard, Annie Chateau, Cedric Chauve, Christophe Paul, and Eric Tannier. Computation of perfect dcj rearrangement scenarios with linear and circular chromosomes. Journal of Computational Biology, 16(10):1287-1309, 2009. Google Scholar
  5. Severine Bérard, Cedric Chauve, and Christophe Paul. A more efficient algorithm for perfect sorting by reversals. Information processing letters, 106(3):90-95, 2008. Google Scholar
  6. Anne Bergeron, Yannick Gingras, and Cedric Chauve. Formal models of gene clusters. Bioinformatics Algorithms: Techniques and Applications, 8:177-202, 2008. URL: https://doi.org/10.1002/9780470253441.ch8.
  7. Anne Bergeron, Julia Mixtacki, and Jens Stoye. Mathematics of evolution and phylogeny, chapter the inversion distance problem, 2005. Google Scholar
  8. Matthias Bernt, Daniel Merkle, Kai Ramsch, Guido Fritzsch, Marleen Perseke, Detlef Bernhard, Martin Schlegel, Peter F Stadler, and Martin Middendorf. Crex: inferring genomic rearrangements based on common intervals. Bioinformatics, 23(21):2957-2958, 2007. Google Scholar
  9. Guillaume Blin, Cedric Chauve, and Guillaume Fertin. The breakpoint distance for signed sequences. In Proc. 1st Algorithms and Computational Methods for Biochemical and Evolutionary Networks (CompBioNets), pages 3-16, 2004. Google Scholar
  10. Kellogg S Booth and George S Lueker. Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-tree algorithms. Journal of Computer and System Sciences, 13(3):335-379, 1976. Google Scholar
  11. Marília DV Braga, Marie-France Sagot, Celine Scornavacca, and Eric Tannier. Exploring the solution space of sorting by reversals, with experiments and an application to evolution. IEEE/ACM transactions on computational biology and bioinformatics, 5(3):348-356, 2008. Google Scholar
  12. Anne Chao, Robin L Chazdon, Robert K Colwell, and Tsung-Jen Shen. Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics, 62(2):361-371, 2006. Google Scholar
  13. Jue Chen, Gang Lu, Jeffrey Lin, Amy L Davidson, and Florante A Quiocho. A tweezers-like motion of the ATP-binding cassette dimer in an ABC transport cycle. Molecular cell, 12(3):651-661, 2003. Google Scholar
  14. Aaron E Darling, István Miklós, and Mark A Ragan. Dynamics of genome rearrangement in bacterial populations. PLoS genetics, 4(7):e1000128, 2008. Google Scholar
  15. Amy L Davidson, Elie Dassa, Cedric Orelle, and Jue Chen. Structure, function, and evolution of bacterial ATP-binding cassette systems. Microbiology and molecular biology reviews, 72(2):317-364, 2008. Google Scholar
  16. Yoan Diekmann, Marie-France Sagot, and Eric Tannier. Evolution under reversals: Parsimony and conservation of common intervals. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(2):301-309, 2007. Google Scholar
  17. Guillaume Fertin, Anthony Labarre, Irena Rusu, Stéphane Vialette, and Eric Tannier. Combinatorics of genome rearrangements. MIT press, 2009. Google Scholar
  18. Martin Figeac and Jean-Stéphane Varré. Sorting by reversals with common intervals. In International Workshop on Algorithms in Bioinformatics, pages 26-37. Springer, 2004. Google Scholar
  19. Marco Fondi, Giovanni Emiliani, and Renato Fani. Origin and evolution of operons and metabolic pathways. Research in Microbiology, 160(7):502-512, 2009. URL: https://doi.org/10.1016/j.resmic.2009.05.001.
  20. Yair E Gatt and Hanah Margalit. Common adaptive strategies underlie within-host evolution of bacterial pathogens. Molecular biology and evolution, 38(3):1101-1121, 2021. Google Scholar
  21. Sridhar Hannenhalli and Pavel A Pevzner. Transforming men into mice (polynomial algorithm for genomic distance problem). In Proceedings of IEEE 36th annual foundations of computer science, pages 581-592. IEEE, 1995. Google Scholar
  22. Sridhar Hannenhalli and Pavel A Pevzner. Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. Journal of the ACM (JACM), 46(1):1-27, 1999. Google Scholar
  23. Tom Hartmann, Matthias Bernt, and Martin Middendorf. An exact algorithm for sorting by weighted preserving genome rearrangements. IEEE/ACM transactions on computational biology and bioinformatics, 16(1):52-62, 2018. Google Scholar
  24. Géraldine Jean and Macha Nikolski. Genome rearrangements: a correct algorithm for optimal capping. Information Processing Letters, 104(1):14-20, 2007. Google Scholar
  25. Gad M Landau, Laxmi Parida, and Oren Weimann. Gene proximity analysis across whole genomes via PQ trees1. Journal of Computational Biology, 12(10):1289-1306, 2005. Google Scholar
  26. Claire Lemaitre, Marilia DV Braga, Christian Gautier, Marie-France Sagot, Eric Tannier, and Gabriel AB Marais. Footprints of inversions at present and past pseudoautosomal boundaries in human sex chromosomes. Genome biology and evolution, 1:56-66, 2009. Google Scholar
  27. Anne E Magurran. Measuring biological diversity. Current Biology, 31(19):R1174-R1177, 2021. Google Scholar
  28. Marie-France Sagot and Eric Tannier. Perfect sorting by reversals. In International Computing and Combinatorics Conference, pages 42-51. Springer, 2005. Google Scholar
  29. Dina Svetlitsky, Tal Dagan, and Michal Ziv-Ukelson. Discovery of multi-operon colinear syntenic blocks in microbial genomes. Bioinformatics, 2020. URL: https://doi.org/10.1093/bioinformatics/btaa503.
  30. Eric Tannier, Anne Bergeron, and Marie-France Sagot. Advances on sorting by reversals. Discrete Applied Mathematics, 155(6-7):881-888, 2007. Google Scholar
  31. Roman L Tatusov, Arcady R Mushegian, Peer Bork, Nigel P Brown, William S Hayes, Mark Borodovsky, Kenneth E Rudd, and Eugene V Koonin. Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Current biology, 6(3):279-291, 1996. Google Scholar
  32. Tatiana Tatusova, Stacy Ciufo, Boris Fedorov, Kathleen O’Neill, and Igor Tolstoy. Refseq microbial genomes database: new representation and annotation strategy. Nucleic Acids Research, 42(D1):D553-D559, 2014. URL: https://doi.org/10.1093/nar/gkt1274.
  33. Kentaro Tomii and Minoru Kanehisa. A comparative analysis of ABC transporters in complete microbial genomes. Genome research, 8(10):1048-1059, 1998. Google Scholar
  34. Alice R Wattam, David Abraham, Oral Dalay, Terry L Disz, Timothy Driscoll, Joseph L Gabbard, Joseph J Gillespie, Roger Gough, Deborah Hix, Ronald Kenyon, et al. Patric, the bacterial bioinformatics database and analysis resource. Nucleic Acids Research, 42(D1):D581-D591, 2014. URL: https://doi.org/10.1093/nar/gkt1099.
  35. Jonathan N Wells, L Therese Bergendahl, and Joseph A Marsh. Operon gene order is optimized for ordered protein complex assembly. Cell reports, 14(4):679-685, 2016. Google Scholar
  36. Sophia Yancopoulos, Oliver Attie, and Richard Friedberg. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics, 21(16):3340-3346, 2005. Google Scholar
  37. G. R. Zimerman. The PQFinder tool. URL: https://www.github.com/GaliaZim/PQFinder.
  38. Galia R Zimerman, Dina Svetlitsky, Meirav Zehavi, and Michal Ziv-Ukelson. Approximate search for known gene clusters in new genomes using PQ-trees. arXiv preprint arXiv:2007.03589, 2020. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail