Document Open Access Logo

Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time

Authors Maxime Crochemore, Alexandre P. Francisco, Solon P. Pissis, Cátia Vaz

Thumbnail PDF


  • Filesize: 1.46 MB
  • 14 pages

Document Identifiers

Author Details

Maxime Crochemore
Alexandre P. Francisco
Solon P. Pissis
Cátia Vaz

Cite AsGet BibTex

Maxime Crochemore, Alexandre P. Francisco, Solon P. Pissis, and Cátia Vaz. Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 9:1-9:14, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2017)


Computing genetic evolution distances among a set of taxa dominates the running time of many phylogenetic inference methods. Most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method.
  • computational biology
  • phylogenetic inference
  • Hamming distance


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Michael A. Bender and Martín Farach-Colton. The LCA problem revisited. In LATIN 2000: Theoretical Informatics: 4th Latin American Symposium, volume 1776 of Lecture Notes in Computer Science, pages 88-94. Springer, 2000. URL:
  2. Michael A Bender, Martín Farach-Colton, Giridhar Pemmasani, Steven Skiena, and Pavel Sumazin. Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms, 57(2):75-94, 2005. URL:
  3. Claire Chewapreecha, Simon R. Harris, Nicholas J. Croucher, Claudia Turner, Pekka Marttinen, Lu Cheng, Alberto Pessia, David M. Aanensen, Alison E. Mather, Andrew J. Page, Susannah J. Salter, David Harris, Francois Nosten, David Goldblatt, Jukka Corander, Julian Parkhill, Paul Turner, and Stephen D. Bentley. Dense genomic sampling identifies highways of pneumococcal recombination. Nature Genetics, 46(3):305-309, 2014. URL:
  4. Nicholas J Croucher, Jonathan A Finkelstein, Stephen I Pelton, Patrick K Mitchell, Grace M Lee, Julian Parkhill, Stephen D Bentley, William P Hanage, and Marc Lipsitch. Population genomics of post-vaccine changes in pneumococcal epidemiology. Nature Genetics, 45(6):656-663, 2013. URL:
  5. Richard Desper and Olivier Gascuel. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. Journal of Computational Biology, 9(5):687-705, 2002. URL:
  6. EnteroBase. URL:
  7. Edward J. Feil, Edward C. Holmes, Debra E. Bessen, Man-Suen Chan, Nicholas P. J. Day, Mark C. Enright, Richard Goldstein, Derek W. Hood, Awdhesh Kalia, Catrin E. Moore, et al. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proceedings of the National Academy of Sciences, 98(1):182-187, 2001. URL:
  8. Edward J. Feil, Bao C. Li, David M. Aanensen, William P. Hanage, and Brian G. Spratt. eBURST: inferring patterns of evolutionary descent among clusters of related bacterial genotypes from multilocus sequence typing data. Journal of Bacteriology, 186(5):1518-1530, 2004. URL:
  9. Alexandre P Francisco, Miguel Bugalho, Mário Ramirez, and João Carriço. Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach. BMC Bioinformatics, 10(1), 2009. URL:
  10. Alexandre P. Francisco, Cátia Vaz, Pedro T. Monteiro, José Melo-Cristino, Mário Ramirez, and Joao A. Carriço. PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods. BMC Bioinformatics, 13(1):87, 2012. URL:
  11. Daniel H. Huson, Regula Rupp, and Celine Scornavacca. Phylogenetic networks: concepts, algorithms and applications. Cambridge University Press, 2010. URL:
  12. Juha Kärkkäinen, Peter Sanders, and Stefan Burkhardt. Linear work suffix array construction. Journal of ACM, 53(6):918-936, 2006. URL:
  13. Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Annual Symposium on Combinatorial Pattern Matching, pages 181-192. Springer, 2001. URL:
  14. Pang Ko and Srinivas Aluru. Space efficient linear time construction of suffix arrays. In Annual Symposium on Combinatorial Pattern Matching, volume 2676 of Lecture Notes in Computer Science, pages 200-210. Springer, 2003. URL:
  15. Joseph B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7(1):48-50, 1956. URL:
  16. Udi Manber and Gene Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935-948, 1993. URL:
  17. J. Ian Munro, Yakov Nekrich, and Jeffrey Scott Vitter. Dynamic data structures for document collections and graphs. In Proceedings of the 34th ACM Symposium on Principles of Database Systems, pages 277-289. ACM, 2015. URL:
  18. Marta Nascimento, Adriano Sousa, Mário Ramirez, Alexandre P. Francisco, João A. Carriço, and Cátia Vaz. PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods. Bioinformatics, 33(1):128-129, 2017. URL:
  19. National Center for Biotechnology Information. GeneBank. URL:
  20. Andrew J. Page, Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane, and Simon R. Harris. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial Genomics, 2(4), 2016. URL:
  21. Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., 1982. Google Scholar
  22. Fabio Pardi and Olivier Gascuel. Distance-based methods in phylogenetics. In Encyclopedia of Evolutionary Biology, pages 458-465. Elsevier, 2016. URL:
  23. D. Ashley Robinson, Edward J. Feil, and Daniel Falush. Bacterial population genetics in infectious disease. John Wiley &Sons, 2010. URL:
  24. Naruya Saitou. Introduction to evolutionary genomics. Springer, 2013. URL:
  25. Naruya Saitou and Masatoshi Nei. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4):406-425, 1987. URL:
  26. Robert R. Sokal. A statistical method for evaluating systematic relationships. Univ Kans Sci Bull, 38:1409-1438, 1958. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail