Approximate Search for Known Gene Clusters in New Genomes Using PQ-Trees

Zimerman, Galia R.; Svetlitsky, Dina; Zehavi, Meirav; Ziv-Ukelson, Michal

doi:10.4230/LIPIcs.WABI.2020.1

File

LIPIcs.WABI.2020.1.pdf

Filesize: 0.92 MB
24 pages

Document Identifiers

DOI: 10.4230/LIPIcs.WABI.2020.1
URN: urn:nbn:de:0030-drops-127906

Author Details

Galia R. Zimerman

Ben Gurion University of the Negev, Beer Sheva, Israel

Dina Svetlitsky

Ben Gurion University of the Negev, Beer Sheva, Israel

Meirav Zehavi

Ben Gurion University of the Negev, Beer Sheva, Israel

Michal Ziv-Ukelson

Ben Gurion University of the Negev, Beer Sheva, Israel

Acknowledgements

Many thanks to Lev Gourevitch for his excellent implementation of a PQ-tree builder. We also thank the anonymous WABI reviewers for their very helpful comments.

Cite AsGet BibTex

Galia R. Zimerman, Dina Svetlitsky, Meirav Zehavi, and Michal Ziv-Ukelson. Approximate Search for Known Gene Clusters in New Genomes Using PQ-Trees. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 1:1-1:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)
https://doi.org/10.4230/LIPIcs.WABI.2020.1

Abstract

We define a new problem in comparative genomics, denoted PQ-Tree Search, that takes as input a PQ-tree T representing the known gene orders of a gene cluster of interest, a gene-to-gene substitution scoring function h, integer parameters d_T and d_S, and a new genome S. The objective is to identify in S approximate new instances of the gene cluster that could vary from the known gene orders by genome rearrangements that are constrained by T, by gene substitutions that are governed by h, and by gene deletions and insertions that are bounded from above by d_T and d_S, respectively. We prove that the PQ-Tree Search problem is NP-hard and propose a parameterized algorithm that solves the optimization variant of PQ-Tree Search in O^*(2^{γ}) time, where γ is the maximum degree of a node in T and O^* is used to hide factors polynomial in the input size. The algorithm is implemented as a search tool, denoted PQFinder, and applied to search for instances of chromosomal gene clusters in plasmids, within a dataset of 1,487 prokaryotic genomes. We report on 29 chromosomal gene clusters that are rearranged in plasmids, where the rearrangements are guided by the corresponding PQ-tree. One of these results, coding for a heavy metal efflux pump, is further analysed to exemplify how PQFinder can be harnessed to reveal interesting new structural variants of known gene clusters.

Subject Classification

ACM Subject Classification

Applied computing → Bioinformatics

Keywords

PQ-Tree
Gene Cluster
Efflux Pump

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Zaky Adam, Monique Turmel, Claude Lemieux, and David Sankoff. Common intervals and symmetric difference in a model-free phylogenomics, with an application to streptophyte evolution. Journal of Computational Biology, 14(4):436-445, 2007. URL: https://doi.org/10.1089/cmb.2007.A005.
Farid Alizadeh, Richard M Karp, Deborah K Weisser, and Geoffrey Zweig. Physical mapping of chromosomes using unique probes. Journal of Computational Biology, 2(2):159-184, 1995. URL: https://doi.org/10.1089/cmb.1995.2.159.
Severine Bérard, Anne Bergeron, Cedric Chauve, and Christophe Paul. Perfect sorting by reversals is not always difficult. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(1):4-16, 2007. URL: https://doi.org/10.1145/1229968.1229972.
Anne Bergeron, Mathieu Blanchette, Annie Chateau, and Cedric Chauve. Reconstructing ancestral gene orders using conserved intervals. In International Workshop on Algorithms in Bioinformatics, pages 14-25. Springer, 2004. URL: https://doi.org/10.1007/978-3-540-30219-3_2.
Anne Bergeron, Sylvie Corteel, and Mathieu Raffinot. The algorithmic of gene teams. In International Workshop on Algorithms in Bioinformatics, pages 464-476. Springer, 2002. URL: https://doi.org/10.1007/3-540-45784-4_36.
Anne Bergeron, Yannick Gingras, and Cedric Chauve. Formal models of gene clusters. Bioinformatics Algorithms: Techniques and Applications, 8:177-202, 2008. URL: https://doi.org/10.1002/9780470253441.ch8.
Anne Bergeron, Julia Mixtacki, and Jens Stoye. Reversal distance without hurdles and fortresses. In Annual Symposium on Combinatorial Pattern Matching, pages 388-399. Springer, 2004. URL: https://doi.org/10.1007/978-3-540-27801-6_29.
Sebastian Böcker, Katharina Jahn, Julia Mixtacki, and Jens Stoye. Computation of median gene clusters. Journal of Computational Biology, 16(8):1085-1099, 2009. URL: https://doi.org/10.1089/cmb.2009.0098.
Kellogg S Booth and George S Lueker. Testing for the consecutive ones property, interval graphs, and graph planarity using pq-tree algorithms. Journal of Computer and System Sciences, 13(3):335-379, 1976. URL: https://doi.org/10.1016/S0022-0000(76)80045-1.
Thomas Christof, Michael Jünger, John Kececioglu, Petra Mutzel, and Gerhard Reinelt. A branch-and-cut approach to physical mapping of chromosomes by unique end-probes. Journal of Computational Biology, 4(4):433-447, 1997. URL: https://doi.org/10.1089/cmb.1997.4.433.
Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015. URL: https://doi.org/10.1007/978-3-319-21275-3.
Rodney G. Downey and Michael R. Fellows. Fundamentals of Parameterized Complexity. Texts in Computer Science. Springer, 2013. URL: https://doi.org/10.1007/978-1-4471-5559-1.
Dijun Du, Zhao Wang, Nathan R James, Jarrod E Voss, Ewa Klimont, Thelma Ohene-Agyei, Henrietta Venter, Wah Chiu, and Ben F Luisi. Structure of the AcrAB-TolC multidrug efflux pump. Nature, 509(7501):512-515, 2014. URL: https://doi.org/10.1038/nature13205.
William G Eberhard. Evolution in bacterial plasmids and levels of selection. The Quarterly Review of Biology, 65(1):3-22, 1990. URL: https://doi.org/10.1086/416582.
Revital Eres, Gad M Landau, and Laxmi Parida. A combinatorial approach to automatic discovery of cluster-patterns. In International Workshop on Algorithms in Bioinformatics, pages 139-150. Springer, 2003. URL: https://doi.org/10.1007/978-3-540-39763-2_11.
Fedor V Fomin, Daniel Lokshtanov, Saket Saurabh, and Meirav Zehavi. Kernelization: Theory of Parameterized Preprocessing. Cambridge University Press, 2019.
Marco Fondi, Giovanni Emiliani, and Renato Fani. Origin and evolution of operons and metabolic pathways. Research in Microbiology, 160(7):502-512, 2009. URL: https://doi.org/10.1016/j.resmic.2009.05.001.
Yue Fu, Feng-Ming James Chang, and David P Giedroc. Copper transport and trafficking at the host-bacterial pathogen interface. Accounts of Chemical Research, 47(12):3605-3613, 2014. URL: https://doi.org/10.1021/ar500300n.
Lev Gourevitch. A program for pq-tree construction. URL: https://github.com/levgou/pqtrees.
Susu He, Michael Chandler, Alessandro M Varani, Alison B Hickman, John P Dekker, and Fred Dyda. Mechanisms of evolution in high-consequence drug resistance plasmids. mBio, 7(6):e01987-16, 2016. URL: https://doi.org/10.1128/mBio.01987-16.
Xin He and Michael H Goldwasser. Identifying conserved gene clusters in the presence of homology families. Journal of Computational Biology, 12(6):638-656, 2005. URL: https://doi.org/10.1089/cmb.2005.12.638.
Steffen Heber and Jens Stoye. Algorithms for finding gene clusters. In International Workshop on Algorithms in Bioinformatics, pages 252-263. Springer, 2001. URL: https://doi.org/10.1007/3-540-44696-6_20.
J Mark Keil. On the complexity of scheduling tasks with discrete starting times. Operations Research Letters, 12(5):293-295, 1992. URL: https://doi.org/10.1016/0167-6377(92)90087-J.
Gad M Landau, Laxmi Parida, and Oren Weimann. Gene proximity analysis across whole genomes via pq trees. Journal of Computational Biology, 12(10):1289-1306, 2005. URL: https://doi.org/10.1089/cmb.2005.12.1289.
William W Metcalf and Barry L Wanner. Evidence for a fourteen-gene, phnC to phnP locus for phosphonate metabolism in escherichia coli. Gene, 129(1):27-32, 1993. URL: https://doi.org/10.1016/0378-1119(93)90692-V.
Kazuo Nakajima and S Louis Hakimi. Complexity results for scheduling tasks with discrete starting times. Journal of Algorithms, 3(4):344-361, 1982. URL: https://doi.org/10.1016/0196-6774(82)90030-X.
Dietrich H Nies. Efflux-mediated heavy metal resistance in prokaryotes. FEMS Microbiology Reviews, 27(2-3):313-339, 2003. URL: https://doi.org/10.1016/S0168-6445(03)00048-2.
Vic Norris and Annabelle Merieau. Plasmids as scribbling pads for operon formation and propagation. Research in Microbiology, 164(7):779-787, 2013. URL: https://doi.org/10.1016/j.resmic.2013.04.003.
Alex Orlek, Nicole Stoesser, Muna F Anjum, Michel Doumith, Matthew J Ellington, Tim Peto, Derrick Crook, Neil Woodford, A Sarah Walker, Hang Phan, et al. Plasmid classification in an era of whole-genome sequencing: application in studies of antibiotic resistance epidemiology. Frontiers in Microbiology, 8:182, 2017. URL: https://doi.org/10.3389/fmicb.2017.00182.
Laxmi Parida. Using pq structures for genomic rearrangement phylogeny. Journal of Computational Biology, 13(10):1685-1700, 2006. URL: https://doi.org/10.1089/cmb.2006.13.1685.
Gerard Salton, Anita Wong, and Chung-Shu Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620, 1975. URL: https://doi.org/10.1145/361219.361220.
Thomas Schmidt and Jens Stoye. Quadratic time algorithms for finding common intervals in two and more sequences. In Combinatorial Pattern Matching, pages 347-358. Springer, 2004. URL: https://doi.org/10.1007/978-3-540-27801-6_26.
Frits CR Spieksma. On the approximability of an interval scheduling problem. Journal of Scheduling, 2(5):215-227, 1999. URL: https://doi.org/10.1002/(SICI)1099-1425(199909/10)2:5<215::AID-JOS27>3.0.CO;2-Y.
Frits CR Spieksma and Yves Crama. The complexity of scheduling short tasks with few starting times. Rijksuniversiteit Limburg. Vakgroep Wiskunde, 1992.
Mark C Sulavik, Chad Houseweart, Christina Cramer, Nilofer Jiwani, Nicholas Murgolo, Jonathan Greene, Beth DiDomenico, Karen Joy Shaw, George H Miller, Roberta Hare, et al. Antibiotic susceptibility profiles of escherichia coli strains lacking multidrug efflux pump genes. Antimicrobial Agents and Chemotherapy, 45(4):1126-1136, 2001. URL: https://doi.org/10.1128/AAC.45.4.1126-1136.2001.
Dina Svetlitsky, Tal Dagan, and Michal Ziv-Ukelson. Discovery of multi-operon colinear syntenic blocks in microbial genomes. Bioinformatics, 2020. URL: https://doi.org/10.1093/bioinformatics/btaa503.
Roman L Tatusov, Michael Y Galperin, Darren A Natale, and Eugene V Koonin. The cog database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research, 28(1):33-36, 2000. URL: https://doi.org/10.1093/nar/28.1.33.
Tatiana Tatusova, Stacy Ciufo, Boris Fedorov, Kathleen O’Neill, and Igor Tolstoy. Refseq microbial genomes database: new representation and annotation strategy. Nucleic Acids Research, 42(D1):D553-D559, 2014. URL: https://doi.org/10.1093/nar/gkt1274.
Takeaki Uno and Mutsunori Yagiura. Fast algorithms to enumerate all common intervals of two permutations. Algorithmica, 26(2):290-309, 2000. URL: https://doi.org/10.1007/s004539910014.
René van Bevern, Matthias Mnich, Rolf Niedermeier, and Mathias Weller. Interval scheduling and colorful independent sets. Journal of Scheduling, 18(5):449-469, October 2015. URL: https://doi.org/10.1007/s10951-014-0398-5.
Joachim Vandecraen, Michael Chandler, Abram Aertsen, and Rob Van Houdt. The impact of insertion sequences on bacterial genome plasticity and adaptability. Critical Reviews in Microbiology, 43(6):709-730, 2017. PMID: 28407717. URL: https://doi.org/10.1080/1040841X.2017.1303661.
Alice R Wattam, David Abraham, Oral Dalay, Terry L Disz, Timothy Driscoll, Joseph L Gabbard, Joseph J Gillespie, Roger Gough, Deborah Hix, Ronald Kenyon, et al. Patric, the bacterial bioinformatics database and analysis resource. Nucleic Acids Research, 42(D1):D581-D591, 2014. URL: https://doi.org/10.1093/nar/gkt1099.
Jonathan N Wells, L Therese Bergendahl, and Joseph A Marsh. Operon gene order is optimized for ordered protein complex assembly. Cell Reports, 14(4):679-685, 2016. URL: https://doi.org/10.1016/j.celrep.2015.12.085.
Sascha Winter, Katharina Jahn, Stefanie Wehner, Leon Kuchenbecker, Manja Marz, Jens Stoye, and Sebastian Böcker. Finding approximate gene clusters with gecko 3. Nucleic Acids Research, 44(20):9600-9610, 2016. URL: https://doi.org/10.1093/nar/gkw843.
G. R. Zimerman, D. Svetlitsky, M. Zehavi, and M. Ziv-Ukelson. Approximate search for known gene clusters in new genomes using pq-trees, 2020. URL: http://arxiv.org/abs/2007.03589.