Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants

Authors Yutong Qiu , Yihang Shen , Carl Kingsford



PDF
Thumbnail PDF

File

LIPIcs.WABI.2023.11.pdf
  • Filesize: 1.53 MB
  • 22 pages

Document Identifiers

Author Details

Yutong Qiu
  • Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
Yihang Shen
  • Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
Carl Kingsford
  • Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA

Acknowledgements

Conflict of Interest: C.K. is a co-founder of Ocean Genomics, Inc.

Cite As Get BibTex

Yutong Qiu, Yihang Shen, and Carl Kingsford. Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants. In 23rd International Workshop on Algorithms in Bioinformatics (WABI 2023). Leibniz International Proceedings in Informatics (LIPIcs), Volume 273, pp. 11:1-11:22, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023) https://doi.org/10.4230/LIPIcs.WABI.2023.11

Abstract

The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al. (2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al. (2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs will always yield optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity of existing string-to-graph matching problems. 
We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics that estimate GTED efficiently.

Subject Classification

ACM Subject Classification
  • Theory of computation → Design and analysis of algorithms
Keywords
  • Integer Linear Programming
  • Genome Graphs
  • Flow Network
  • Graph Comparison

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Guillaume Bourque, Pavel A Pevzner, and Glenn Tesler. Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. Genome Research, 14(4):507-516, 2004. Google Scholar
  2. Stephen P Bradley, Arnoldo C Hax, and Thomas L Magnanti. Applied mathematical programming. Addison-Wesley, 1977. Google Scholar
  3. George Dantzig, Ray Fulkerson, and Selmer Johnson. Solution of a large-scale traveling-salesman problem. Journal of the Operations Research Society of America, 2(4):393-410, 1954. Google Scholar
  4. Eva Darai-Ramqvist, Agneta Sandlund, Stefan Müller, George Klein, Stefan Imreh, and Maria Kost-Alimova. Segmental duplications and evolutionary plasticity at tumor chromosome break-prone regions. Genome Research, 18(3):370-379, 2008. Google Scholar
  5. Fernando HC Dias, Lucia Williams, Brendan Mumey, and Alexandru I Tomescu. Minimum flow decomposition in graphs with cycles using integer linear programming. arXiv preprint, 2022. URL: https://arxiv.org/abs/2209.00042.
  6. Ali Ebrahimpour Boroojeny, Akash Shrestha, Ali Sharifi-Zarchi, Suzanne Renick Gallagher, S. Cenk Sahinalp, and Hamidreza Chitsaz. Graph traversal edit distance and extensions. Journal of Computational Biology, 27(3):317-329, 2020. Google Scholar
  7. Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023. URL: https://www.gurobi.com.
  8. Steve Huntsman and Arman Rezaee. De Bruijn entropy and string similarity. arXiv preprint, 2015. URL: https://arxiv.org/abs/1509.02975.
  9. Chirag Jain, Haowen Zhang, Yu Gao, and Srinivas Aluru. On the complexity of sequence-to-graph alignment. Journal of Computational Biology, 27(4):640-654, 2020. Google Scholar
  10. Orna Kupferman and Gal Vardi. Eulerian paths with regular constraints. In Piotr Faliszewski, Anca Muscholl, and Rolf Niedermeier, editors, 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016), volume 58 of Leibniz International Proceedings in Informatics (LIPIcs), pages 62:1-62:15, Dagstuhl, Germany, 2016. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Google Scholar
  11. Marie-Paule Lefranc. IMGT, the international ImMunoGeneTics information system. Cold Spring Harbor Protocols, 2011(6):595-603, 2011. Google Scholar
  12. Yilong Li, Nicola D Roberts, Jeremiah A Wala, Ofer Shapira, Steven E Schumacher, Kiran Kumar, Ekta Khurana, Sebastian Waszak, Jan O Korbel, James E Haber, et al. Patterns of somatic structural variation in human cancer genomes. Nature, 578(7793):112-121, 2020. Google Scholar
  13. Serghei Mangul and David Koslicki. Reference-free comparison of microbial communities via de Bruijn graphs. In Proceedings of the 7th ACM international conference on bioinformatics, computational biology, and health informatics, pages 68-77, 2016. Google Scholar
  14. Clair E Miller, Albert W Tucker, and Richard A Zemlin. Integer programming formulation of traveling salesman problems. Journal of the ACM (JACM), 7(4):326-329, 1960. Google Scholar
  15. Ilia Minkin and Paul Medvedev. Scalable pairwise whole-genome homology mapping of long genomes with Bubbz. IScience, 23(6):101224, 2020. Google Scholar
  16. James R Munkres. Elements of algebraic topology. CRC Press, 2018. Google Scholar
  17. Saul B Needleman and Christian D Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443-453, 1970. Google Scholar
  18. Ofir Pele and Michael Werman. A linear time histogram metric for improved sift matching. In Computer Vision-ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part III 10, pages 495-508. Springer, 2008. Google Scholar
  19. Pavel A Pevzner, Haixu Tang, and Michael S Waterman. An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of USA, 98(17):9748-9753, 2001. Google Scholar
  20. Evgeny Polevikov and Mikhail Kolmogorov. Synteny Paths for Assembly Graphs Comparison. In Katharina T. Huber and Dan Gusfield, editors, 19th International Workshop on Algorithms in Bioinformatics (WABI 2019), volume 143 of Leibniz International Proceedings in Informatics (LIPIcs), pages 24:1-24:14, Dagstuhl, Germany, 2019. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. URL: https://doi.org/10.4230/LIPIcs.WABI.2019.24.
  21. Yutong Qiu and Carl Kingsford. The effect of genome graph expressiveness on the discrepancy between genome graph distance and string set distance. Bioinformatics, 38:i404-i412, 2022. Google Scholar
  22. Yutong Qiu, Yihang Shen, and Carl Kingsford. Revisiting the complexity of and algorithms for the Graph Traversal Edit Distance and Its variants. arXiv preprint, 2023. URL: https://arxiv.org/abs/2305.10577.
  23. Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The Earth Mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99-121, 2000. Google Scholar
  24. Mitchell R Vollger, Xavi Guitart, Philip C Dishuck, Ludovica Mercuri, William T Harvey, Ariel Gershman, Mark Diekhans, Arvis Sulovari, Katherine M Munson, Alexandra P Lewis, et al. Segmental duplications and their variation in a complete human genome. Science, 376(6588):eabj6965, 2022. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail