Reconstructing Biological and Digital Phylogenetic Trees in Parallel

Authors Ramtin Afshar , Michael T. Goodrich , Pedro Matias , Martha C. Osegueda



PDF
Thumbnail PDF

File

LIPIcs.ESA.2020.3.pdf
  • Filesize: 2.35 MB
  • 24 pages

Document Identifiers

Author Details

Ramtin Afshar
  • University of California-Irvine, CA, USA
Michael T. Goodrich
  • University of California-Irvine, CA, USA
Pedro Matias
  • University of California-Irvine, CA, USA
Martha C. Osegueda
  • University of California-Irvine, CA, USA

Cite As Get BibTex

Ramtin Afshar, Michael T. Goodrich, Pedro Matias, and Martha C. Osegueda. Reconstructing Biological and Digital Phylogenetic Trees in Parallel. In 28th Annual European Symposium on Algorithms (ESA 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 173, pp. 3:1-3:24, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.ESA.2020.3

Abstract

In this paper, we study the parallel query complexity of reconstructing biological and digital phylogenetic trees from simple queries involving their nodes. This is motivated from computational biology, data protection, and computer security settings, which can be abstracted in terms of two parties, a responder, Alice, who must correctly answer queries of a given type regarding a degree-d tree, T, and a querier, Bob, who issues batches of queries, with each query in a batch being independent of the others, so as to eventually infer the structure of T. We show that a querier can efficiently reconstruct an n-node degree-d tree, T, with a logarithmic number of rounds and quasilinear number of queries, with high probability, for various types of queries, including relative-distance queries and path queries. Our results are all asymptotically optimal and improve the asymptotic (sequential) query complexity for one of the problems we study. Moreover, through an experimental analysis using both real-world and synthetic data, we provide empirical evidence that our algorithms provide significant parallel speedups while also improving the total query complexities for the problems we study.

Subject Classification

ACM Subject Classification
  • Theory of computation → Parallel computing models
Keywords
  • Tree Reconstruction
  • Parallel Algorithms
  • Privacy
  • Phylogenetic Trees
  • Data Structures
  • Hierarchical Clustering

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Peyman Afshani, Manindra Agrawal, Benjamin Doerr, Carola Doerr, Kasper Green Larsen, and Kurt Mehlhorn. The query complexity of finding a hidden permutation. In Andrej Brodnik, Alejandro López-Ortiz, Venkatesh Raman, and Alfredo Viola, editors, Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday, volume 8066 of Lecture Notes in Computer Science, pages 1-11. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-40273-9_1.
  2. Ramtin Afshar, Michael T. Goodrich, Pedro Matias, and Martha C. Osagueda. Reconstructing biological and digital phylogenetic trees in parallel. ArXiv, 2020. URL: http://arxiv.org/abs/2006.15259.
  3. Ramtin Afshar, Michael T. Goodrich, Pedro Matias, and Martha C. Osegueda. Reconstructing binary trees in parallel dimension (brief announcement). In The 32nd ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2020, PA, USA, July 14-17, 2020, to appear. Google Scholar
  4. Gautam Altekar, Sandhya Dwarkadas, John P. Huelsenbeck, and Fredrik Ronquist. Parallel metropolis coupled markov chain monte carlo for bayesian phylogenetic inference. Bioinform., 20(3):407-415, 2004. URL: https://doi.org/10.1093/bioinformatics/btg427.
  5. Anna Bernasconi, Carsten Damm, and Igor E. Shparlinski. Circuit and decision tree complexity of some number theoretic problems. Inf. Comput., 168(2):113-124, 2001. URL: https://doi.org/10.1006/inco.2000.3017.
  6. Paolo Bestagini, Marco Tagliasacchi, and Stefano Tubaro. Image phylogeny tree reconstruction based on region selection. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pages 2059-2063. IEEE, 2016. URL: https://doi.org/10.1109/ICASSP.2016.7472039.
  7. Anupam Bhattacharjee, Kazi Zakia Sultana, and Zalia Shams. Dynamic and parallel approaches to optimal evolutionary tree construction. In Proceedings of the Canadian Conference on Electrical and Computer Engineering, CCECE 2006, May 7-10, 2006, Ottawa Congress Centre, Ottawa, Canada, pages 119-122. IEEE, 2006. URL: https://doi.org/10.1109/CCECE.2006.277582.
  8. Gerth Stølting Brodal, Rolf Fagerberg, Christian N. S. Pedersen, and Anna Östlin. The complexity of constructing evolutionary trees using experiments. In Fernando Orejas, Paul G. Spirakis, and Jan van Leeuwen, editors, Automata, Languages and Programming, 28th International Colloquium, ICALP 2001, Crete, Greece, July 8-12, 2001, Proceedings, volume 2076 of Lecture Notes in Computer Science, pages 140-151. Springer, 2001. URL: https://doi.org/10.1007/3-540-48224-5_12.
  9. Sung-Soon Choi and Jeong Han Kim. Optimal query complexity bounds for finding graphs. Artif. Intell., 174(9-10):551-569, 2010. URL: https://doi.org/10.1016/j.artint.2010.02.003.
  10. Benny Chor and Tamir Tuller. Maximum likelihood of evolutionary trees: hardness and approximation. In Proceedings Thirteenth International Conference on Intelligent Systems for Molecular Biology 2005, Detroit, MI, USA, 25-29 June 2005, pages 97-106, 2005. URL: https://doi.org/10.1093/bioinformatics/bti1027.
  11. Richard Cole and Uzi Vishkin. Deterministic coin tossing and accelerating cascades: micro and macro techniques for designing parallel algorithms. In Juris Hartmanis, editor, Proceedings of the 18th Annual ACM Symposium on Theory of Computing, May 28-30, 1986, Berkeley, California, USA, pages 206-219. ACM, 1986. URL: https://doi.org/10.1145/12130.12151.
  12. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, 3rd Edition. MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms.
  13. Joseph C. Culberson and Piotr Rudnicki. A fast algorithm for constructing trees from distance matrices. Inf. Process. Lett., 30(4):215-220, 1989. URL: https://doi.org/10.1016/0020-0190(89)90216-0.
  14. Zanoni Dias, Siome Goldenstein, and Anderson Rocha. Exploring heuristic and optimum branching algorithms for image phylogeny. J. Vis. Commun. Image Represent., 24(7):1124-1134, 2013. URL: https://doi.org/10.1016/j.jvcir.2013.07.011.
  15. Zanoni Dias, Siome Goldenstein, and Anderson Rocha. Large-scale image phylogeny: Tracing image ancestral relationships. IEEE Multim., 20(3):58-70, 2013. URL: https://doi.org/10.1109/MMUL.2013.17.
  16. Zanoni Dias, Anderson Rocha, and Siome Goldenstein. Image phylogeny by minimal spanning trees. IEEE Trans. Information Forensics and Security, 7(2):774-788, 2012. URL: https://doi.org/10.1109/TIFS.2011.2169959.
  17. Shahar Dobzinski and Jan Vondrák. From query complexity to computational complexity. In Howard J. Karloff and Toniann Pitassi, editors, Proceedings of the 44th Symposium on Theory of Computing Conference, STOC 2012, New York, NY, USA, May 19 - 22, 2012, pages 1107-1116. ACM, 2012. URL: https://doi.org/10.1145/2213977.2214076.
  18. Ehsan Emamjomeh-Zadeh and David Kempe. Adaptive hierarchical clustering using ordinal queries. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 415-429. SIAM, 2018. URL: https://doi.org/10.1137/1.9781611975031.28.
  19. James S. Farris. Methods for Computing Wagner Trees. Systematic Biology, 19(1):83-92, March 1970. URL: https://doi.org/10.1093/sysbio/19.1.83.
  20. Joseph Felsenstein. Evolutionary trees from dna sequences: a maximum likelihood approach. Journal of molecular evolution, 17(6):368-376, 1981. Google Scholar
  21. Walter M. Fitch. Toward defining the course of evolution: Minimum change for a specific tree topology. Systematic Zoology, 20(4):406-416, 1971. URL: http://www.jstor.org/stable/2412116.
  22. Leslie Ann Goldberg, Paul W. Goldberg, Cynthia A. Phillips, and Gregory B. Sorkin. Constructing computer virus phylogenies. J. Algorithms, 26(1):188-208, 1998. URL: https://doi.org/10.1006/jagm.1997.0897.
  23. M. T. Goodrich and R. Tamassia. Algorithm Design and Applications. Wiley, New York, NY, 2011. Google Scholar
  24. Jotun J Hein. An optimal algorithm to reconstruct trees from additive distance data. Bulletin of mathematical biology, 51(5):597-603, 1989. Google Scholar
  25. John P Huelsenbeck. Performance of phylogenetic methods in simulation. Systematic biology, 44(1):17-48, 1995. Google Scholar
  26. M. Jagadish and Anindya Sen. Learning a bounded-degree tree using separator queries. In Sanjay Jain, Rémi Munos, Frank Stephan, and Thomas Zeugmann, editors, Algorithmic Learning Theory - 24th International Conference, ALT 2013, Singapore, October 6-9, 2013. Proceedings, volume 8139 of Lecture Notes in Computer Science, pages 188-202. Springer, 2013. URL: https://doi.org/10.1007/978-3-642-40935-6_14.
  27. Jeong-Hoon Ji, Su-Hyun Park, Gyun Woo, and Hwan-Gue Cho. Generating pylogenetic tree of homogeneous source code in a plagiarism detection system. International Journal of Control, Automation, and Systems, 6(6):809-817, 2008. Google Scholar
  28. Neil C Jones, Pavel A Pevzner, and Pavel Pevzner. An introduction to bioinformatics algorithms. MIT press, 2004. Google Scholar
  29. Sampath Kannan, Eugene L. Lawler, and Tandy J. Warnow. Determining the evolutionary tree using experiments. J. Algorithms, 21(1):26-50, 1996. URL: https://doi.org/10.1006/jagm.1996.0035.
  30. Sampath Kannan, Claire Mathieu, and Hang Zhou. Graph reconstruction and verification. ACM Trans. Algorithms, 14(4):40:1-40:30, 2018. URL: https://doi.org/10.1145/3199606.
  31. Valerie King, Li Zhang, and Yunhong Zhou. On the complexity of distance-based evolutionary tree reconstruction. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA, pages 444-453. ACM/SIAM, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644179.
  32. Guilherme D Marmerola, Marina A Oikawa, Zanoni Dias, Siome Goldenstein, and Anderson Rocha. On the reconstruction of text phylogeny trees: evaluation and analysis of textual relationships. PloS one, 11(12):e0167822, 2016. Google Scholar
  33. Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005. URL: https://doi.org/10.1017/CBO9780511813603.
  34. Mark Pagel. Inferring the historical patterns of biological evolution. Nature, 401(6756):877-884, 1999. Google Scholar
  35. Avi Pfeffer, Catherine Call, John Chamberlain, Lee Kellogg, Jacob Ouellette, Terry Patten, Greg Zacharias, Arun Lakhotia, Suresh Golconda, John Bay, Robert Hall, and Daniel Scofield. Malware analysis and attribution using genetic information. In 7th International Conference on Malicious and Unwanted Software, MALWARE 2012, Fajardo, PR, USA, October 16-18, 2012, pages 39-45. IEEE Computer Society, 2012. URL: https://doi.org/10.1109/MALWARE.2012.6461006.
  36. WH Piel, L Chan, MJ Dominus, J Ruan, RA Vos, and V Tannen. Treebase v. 2: A database of phylogenetic knowledge. e-biosphere, 2009. Google Scholar
  37. Heinz Prüfer. Neuer beweis eines satzes über permutationen. Arch. Math. Phys, 27(1918):742-744, 1918. Google Scholar
  38. Lev Reyzin and Nikhil Srivastava. On the longest path algorithm for reconstructing trees from distance matrices. Inf. Process. Lett., 101(3):98-100, 2007. URL: https://doi.org/10.1016/j.ipl.2006.08.013.
  39. F. James Rohlf. J. felsenstein, inferring phylogenies, sinauer assoc., 2004, pp. xx + 664. J. Classif., 22(1):139-142, 2005. URL: https://doi.org/10.1007/s00357-005-0009-4.
  40. Bingyu Shen, Christopher W. Forstall, Anderson de Rezende Rocha, and Walter J. Scheirer. Practical text phylogeny for real-world settings. IEEE Access, 6:41002-41012, 2018. URL: https://doi.org/10.1109/ACCESS.2018.2856865.
  41. Yossi Shiloach and Uzi Vishkin. Finding the maximum, merging and sorting in a parallel computation model. In Wolfgang Händler, editor, CONPAR 81: Conference on Analysing Problem Classes and Programming for Parallel Computing, Nürnberg, Germany, June 10-12, 1981, Proceedings, volume 111 of Lecture Notes in Computer Science, pages 314-327. Springer, 1981. URL: https://doi.org/10.1007/BFb0105127.
  42. Gábor Tardos. Query complexity, or why is it difficult to seperate NP ^a cap co np^a from p^a by random oracles a? Combinatorica, 9(4):385-392, 1989. URL: https://doi.org/10.1007/BF02125350.
  43. W.T. Tutte. Graph Theory. Cambridge Mathematical Library. Cambridge University Press, 2001. URL: https://books.google.com/books?id=uTGhooU37h4C.
  44. Leslie G. Valiant. Parallelism in comparison problems. SIAM J. Comput., 4(3):348-355, 1975. URL: https://doi.org/10.1137/0204030.
  45. Leslie G. Valiant. Universality considerations in VLSI circuits. IEEE Trans. Computers, 30(2):135-140, 1981. URL: https://doi.org/10.1109/TC.1981.6312176.
  46. Zhaosen Wang and Jean Honorio. Reconstructing a bounded-degree directed tree using path queries. In 57th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2019, Monticello, IL, USA, September 24-27, 2019, pages 506-513. IEEE, 2019. URL: https://doi.org/10.1109/ALLERTON.2019.8919924.
  47. Michael S Waterman, Temple F Smith, Mona Singh, and William A Beyer. Additive evolutionary trees. Journal of theoretical Biology, 64(2):199-213, 1977. Google Scholar
  48. Andrew Chi-Chih Yao. Decision tree complexity and betti numbers. In Frank Thomson Leighton and Michael T. Goodrich, editors, Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, 23-25 May 1994, Montréal, Québec, Canada, pages 615-624. ACM, 1994. URL: https://doi.org/10.1145/195058.195414.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail