eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2018-08-02
23:1
23:14
10.4230/LIPIcs.WABI.2018.23
article
Heuristic Algorithms for the Maximum Colorful Subtree Problem
Dührkop, Kai
1
Lataretu, Marie A.
1
White, W. Timothy J.
2
https://orcid.org/0000-0002-1997-0176
Böcker, Sebastian
1
https://orcid.org/0000-0002-9304-8091
Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany, and, Berlin Institute of Health, Berlin, Germany
In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this computational task can be formulated as the Maximum Colorful Subtree problem, which is NP-hard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem - and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured.
Here, we comprehensively evaluate the performance of several heuristic algorithms for the problem. Unfortunately, as is often the case in bioinformatics, the structure of the (chemically) true solution is not known to us; therefore we can only evaluate against the optimal solution of an instance. Evaluating the quality of a heuristic based on scores can be misleading: Even a slightly suboptimal solution can be structurally very different from the optimal solution, but it is the structure of a solution and not its score that is relevant for the downstream analysis. To this end, we propose a different evaluation setup: Given a set of candidate instances of which exactly one is known to be correct, the heuristic in question solves each instance to the best of its ability, producing a score for each instance, which is then used to rank the instances. We then evaluate whether the correct instance is ranked highly by the heuristic.
We find that one particular heuristic consistently ranks the correct instance in a top position. We also find that the scores of the best heuristic solutions are very close to the optimal score; in contrast, the structure of the solutions can deviate significantly from the optimal structures. Integrating the heuristic allowed us to speed up computations in practice by a factor of 100-fold.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol113-wabi2018/LIPIcs.WABI.2018.23/LIPIcs.WABI.2018.23.pdf
Fragmentation trees
Computational mass spectrometry