Heuristic Algorithms for the Maximum Colorful Subtree Problem

Authors Kai Dührkop, Marie A. Lataretu, W. Timothy J. White , Sebastian Böcker



PDF
Thumbnail PDF

File

LIPIcs.WABI.2018.23.pdf
  • Filesize: 0.75 MB
  • 14 pages

Document Identifiers

Author Details

Kai Dührkop
  • Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
Marie A. Lataretu
  • Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
W. Timothy J. White
  • Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany, and, Berlin Institute of Health, Berlin, Germany
Sebastian Böcker
  • Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany

Cite As Get BibTex

Kai Dührkop, Marie A. Lataretu, W. Timothy J. White, and Sebastian Böcker. Heuristic Algorithms for the Maximum Colorful Subtree Problem. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 23:1-23:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/LIPIcs.WABI.2018.23

Abstract

In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this computational task can be formulated as the Maximum Colorful Subtree problem, which is NP-hard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem - and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured.
Here, we comprehensively evaluate the performance of several heuristic algorithms for the problem. Unfortunately, as is often the case in bioinformatics, the structure of the (chemically) true solution is not known to us; therefore we can only evaluate against the optimal solution of an instance. Evaluating the quality of a heuristic based on scores can be misleading: Even a slightly suboptimal solution can be structurally very different from the optimal solution, but it is the structure of a solution and not its score that is relevant for the downstream analysis. To this end, we propose a different evaluation setup: Given a set of candidate instances of which exactly one is known to be correct, the heuristic in question solves each instance to the best of its ability, producing a score for each instance, which is then used to rank the instances. We then evaluate whether the correct instance is ranked highly by the heuristic.
We find that one particular heuristic consistently ranks the correct instance in a top position. We also find that the scores of the best heuristic solutions are very close to the optimal score; in contrast, the structure of the solutions can deviate significantly from the optimal structures. Integrating the heuristic allowed us to speed up computations in practice by a factor of 100-fold.

Subject Classification

ACM Subject Classification
  • Applied computing → Computational biology
  • Applied computing → Metabolomics / metabonomics
Keywords
  • Fragmentation trees
  • Computational mass spectrometry

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alexander A Aksenov, Ricardo da Silva, Rob Knight, Norberto P Lopes, and Pieter C Dorrestein. Global chemical analysis of biology by mass spectrometry. Nature Reviews Chemistry, 1(7):0054, 2017. Google Scholar
  2. Sebastian Böcker and Kai Dührkop. Fragmentation trees reloaded. J Cheminform, 8:5, 2016. URL: http://dx.doi.org/10.1186/s13321-016-0116-8.
  3. Sebastian Böcker and Florian Rasche. Towards de novo identification of metabolites by analyzing tandem mass spectra. Bioinformatics, 24:I49-I55, 2008. Proc. of European Conference on Computational Biology (ECCB 2008). URL: http://dx.doi.org/10.1093/bioinformatics/btn270.
  4. Ricardo R. da Silva, Pieter C. Dorrestein, and Robert A. Quinn. Illuminating the dark matter in metabolomics. Proc Natl Acad Sci U S A, 112(41):12549-12550, 2015. URL: http://dx.doi.org/10.1073/pnas.1516878112.
  5. Riccardo Dondi, Guillaume Fertin, and Stéphane Vialette. Complexity issues in vertex-colored graph pattern matching. J Discrete Algorithms, 9(1):82-99, 2011. URL: http://dx.doi.org/doi:10.1016/j.jda.2010.09.002.
  6. Kai Dührkop, Huibin Shen, Marvin Meusel, Juho Rousu, and Sebastian Böcker. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc Natl Acad Sci U S A, 112(41):12580-12585, 2015. URL: http://dx.doi.org/10.1073/pnas.1509788112.
  7. Guillaume Fertin, Julien Fradin, and Géraldine Jean. Algorithmic aspects of the maximum colorful arborescence problem. In Proc. of Theory and Applications of Models of Computation (TAMC 2017), volume 10185 of Lect Notes Comput Sci, pages 216-230, 2017. URL: http://dx.doi.org/10.1007/978-3-319-55911-7_16.
  8. Alaa Khedr, Soad S Abd El-Hay, and Ahmed K Kammoun. Liquid chromatography-tandem mass spectrometric determination of propofol in rat serum and hair at attogram level after derivatization with 3-bromomethyl-propyphenazone. Journal of pharmaceutical and biomedical analysis, 134:195-202, 2017. URL: http://dx.doi.org/10.1016/j.jpba.2016.11.051.
  9. Vincent Lacroix, Cristina G. Fernandes, and Marie-France Sagot. Motif search in graphs: Application to metabolic networks. IEEE/ACM Trans Comput Biology Bioinform, 3(4):360-368, 2006. Google Scholar
  10. Gary J. Patti, Oscar Yanes, and Gary Siuzdak. Metabolomics: The apogee of the omics trilogy. Nat Rev Mol Cell Biol, 13(4):263-269, 2012. URL: http://dx.doi.org/10.1038/nrm3314.
  11. Florian Rasche, Aleš Svatoš, Ravi Kumar Maddula, Christoph Böttcher, and Sebastian Böcker. Computing fragmentation trees from tandem mass spectrometry data. Anal Chem, 83(4):1243-1251, 2011. URL: http://dx.doi.org/10.1021/ac101825k.
  12. Imran Rauf, Florian Rasche, François Nicolas, and Sebastian Böcker. Finding maximum colorful subtrees in practice. J Comput Biol, 20(4):1-11, 2013. URL: http://dx.doi.org/10.1089/cmb.2012.0083.
  13. Emma L Schymanski, Heinz P Singer, Jaroslav Slobodnik, Ildiko M Ipolyi, Peter Oswald, Martin Krauss, Tobias Schulze, Peter Haglund, Thomas Letzel, Sylvia Grosse, Nikolaos S Thomaidis, Anna Bletsou, Christian Zwiener, María Ibáñez, Tania Portolés, Ronald de Boer, Malcolm J Reid, Matthias Onghena, Uwe Kunkel, Wolfgang Schulz, Amélie Guillon, Naïke Noyon, Gaëla Leroy, Philippe Bados, Sara Bogialli, Draženka Stipaničev, Pawel Rostkowski, and Juliane Hollender. Non-target screening with high-resolution mass spectrometry: critical review using a collaborative trial on water analysis. Analytical and bioanalytical chemistry, 407:6237-6255, 2015. URL: http://dx.doi.org/10.1007/s00216-015-8681-7.
  14. Emma Louise Schymanski, Christoph Ruttkies, Martin Krauss, Céline Brouard, Tobias Kind, Kai Dührkop, Felicity Ruth Allen, Arpana Vaniya, Dries Verdegem, Sebastian Böcker, Juho Rousu, Huibin Shen, Hiroshi Tsugawa, Tanvir Sajed, Oliver Fiehn, Bart Ghesquière, and Steffen Neumann. Critical Assessment of Small Molecule Identification 2016: Automated methods. J Cheminf, 9:22, 2017. URL: http://dx.doi.org/10.1186/s13321-017-0207-1.
  15. Huibin Shen, Kai Dührkop, Sebastian Böcker, and Juho Rousu. Metabolite identification through multiple kernel learning on fragmentation trees. Bioinformatics, 30(12):i157-i164, 2014. Proc. of Intelligent Systems for Molecular Biology (ISMB 2014). URL: http://dx.doi.org/10.1093/bioinformatics/btu275.
  16. Aleksandra Skirycz, Sylwia Kierszniowska, Michaël Méret, Lothar Willmitzer, and George Tzotzos. Medicinal bioprospecting of the amazon rainforest: A modern eldorado? Trends in biotechnology, 34:781-790, 2016. URL: http://dx.doi.org/10.1016/j.tibtech.2016.03.006.
  17. Stephen E. Stein. Mass spectral reference libraries: An ever-expanding resource for chemical identification. Anal Chem, 84(17):7274-7282, 2012. URL: http://dx.doi.org/10.1021/ac301205z.
  18. Robert Endre Tarjan. A class of algorithms which require nonlinear time to maintain disjoint sets. J Comput System Sci, 18(2):110-127, 1979. Google Scholar
  19. Mingxun Wang, Jeremy J. Carver, Vanessa V. Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, Don Duy Nguyen, Jeramie Watrous, Clifford A. Kapono, Tal Luzzatto-Knaan, Carla Porto, Amina Bouslimani, Alexey V. Melnik, Michael J. Meehan, Wei-Ting Liu, Max Crüsemann, Paul D. Boudreau, Eduardo Esquenazi, Mario Sandoval-Calderón, Roland D. Kersten, Laura A. Pace, Robert A. Quinn, Katherine R. Duncan, Cheng-Chih Hsu, Dimitrios J. Floros, Ronnie G. Gavilan, Karin Kleigrewe, Trent Northen, Rachel J. Dutton, Delphine Parrot, Erin E. Carlson, Bertrand Aigle, Charlotte F. Michelsen, Lars Jelsbak, Christian Sohlenkamp, Pavel Pevzner, Anna Edlund, Jeffrey McLean, Jörn Piel, Brian T. Murphy, Lena Gerwick, Chih-Chuang Liaw, Yu-Liang Yang, Hans-Ulrich Humpf, Maria Maansson, Robert A. Keyzers, Amy C. Sims, Andrew R. Johnson, Ashley M. Sidebottom, Brian E. Sedio, Andreas Klitgaard, Charles B. Larson, Cristopher A. Boya P, Daniel Torres-Mendoza, David J. Gonzalez, Denise B. Silva, Lucas M. Marques, Daniel P. Demarque, Egle Pociute, Ellis C. O'Neill, Enora Briand, Eric J N. Helfrich, Eve A. Granatosky, Evgenia Glukhov, Florian Ryffel, Hailey Houson, Hosein Mohimani, Jenan J. Kharbush, Yi Zeng, Julia A. Vorholt, Kenji L. Kurita, Pep Charusanti, Kerry L. McPhail, Kristian Fog Nielsen, Lisa Vuong, Maryam Elfeki, Matthew F. Traxler, Niclas Engene, Nobuhiro Koyama, Oliver B. Vining, Ralph Baric, Ricardo R. Silva, Samantha J. Mascuch, Sophie Tomasi, Stefan Jenkins, Venkat Macherla, Thomas Hoffman, Vinayak Agarwal, Philip G. Williams, Jingqui Dai, Ram Neupane, Joshua Gurr, Andrés M C. Rodríguez, Anne Lamsa, Chen Zhang, Kathleen Dorrestein, Brendan M. Duggan, Jehad Almaliti, Pierre-Marie Allard, Prasad Phapale, Louis-Felix Nothias, Theodore Alexandrov, Marc Litaudon, Jean-Luc Wolfender, Jennifer E. Kyle, Thomas O. Metz, Tyler Peryea, Dac-Trung Nguyen, Danielle VanLeer, Paul Shinn, Ajit Jadhav, Rolf Müller, Katrina M. Waters, Wenyuan Shi, Xueting Liu, Lixin Zhang, Rob Knight, Paul R. Jensen, Bernhard Ø. Palsson, Kit Pogliano, Roger G. Linington, Marcelino Gutiérrez, Norberto P. Lopes, William H. Gerwick, Bradley S. Moore, Pieter C. Dorrestein, and Nuno Bandeira. Sharing and community curation of mass spectrometry data with Global Natural Products Social molecular networking. Nat Biotechnol, 34(8):828-837, 2016. URL: http://dx.doi.org/10.1038/nbt.3597.
  20. W. Timothy J. White, Stephan Beyer, Kai Dührkop, Markus Chimani, and Sebastian Böcker. Speedy colorful subtrees. In Proc. of Computing and Combinatorics Conference (COCOON 2015), volume 9198 of Lect Notes Comput Sci, pages 310-322. Springer, Berlin, 2015. URL: http://dx.doi.org/10.1007/978-3-319-21398-9_25.
  21. David S Wishart. Emerging applications of metabolomics in drug discovery and precision medicine. Nature reviews. Drug discovery, 15:473-484, 2016. URL: http://dx.doi.org/10.1038/nrd.2016.32.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail