The Wasserstein Distance as a Dissimilarity Measure for Mass Spectra with Application to Spectral Deconvolution

Authors Szymon Majewski, Michal Aleksander Ciach, Michal Startek, Wanda Niemyska, Blazej Miasojedow, Anna Gambin



PDF
Thumbnail PDF

File

LIPIcs.WABI.2018.25.pdf
  • Filesize: 1.81 MB
  • 21 pages

Document Identifiers

Author Details

Szymon Majewski
  • Institute of Mathematics, Polish Academy of Sciences, Warsaw, Poland
Michal Aleksander Ciach
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
Michal Startek
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
Wanda Niemyska
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
Blazej Miasojedow
  • Institute of Mathematics, Polish Academy of Sciences, Warsaw, Poland
Anna Gambin
  • Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland

Cite AsGet BibTex

Szymon Majewski, Michal Aleksander Ciach, Michal Startek, Wanda Niemyska, Blazej Miasojedow, and Anna Gambin. The Wasserstein Distance as a Dissimilarity Measure for Mass Spectra with Application to Spectral Deconvolution. In 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, pp. 25:1-25:21, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)
https://doi.org/10.4230/LIPIcs.WABI.2018.25

Abstract

We propose a new approach for the comparison of mass spectra using a metric known in the computer science under the name of Earth Mover's Distance and in mathematics as the Wasserstein distance. We argue that this approach allows for natural and robust solutions to various problems in the analysis of mass spectra. In particular, we show an application to the problem of deconvolution, in which we infer proportions of several overlapping isotopic envelopes of similar compounds. Combined with the previously proposed generator of isotopic envelopes, IsoSpec, our approach works for a wide range of masses and charges in the presence of several types of measurement inaccuracies. To reduce the computational complexity of the solution, we derive an effective implementation of the Interior Point Method as the optimization procedure. The software for mass spectral comparison and deconvolution based on Wasserstein distance is available at https://github.com/mciach/wassersteinms.

Subject Classification

ACM Subject Classification
  • Applied computing → Chemistry
Keywords
  • Mass spectrometry
  • Tandem mass spectrometry
  • Wasserstein distance
  • Earth mover's distance
  • Similarity measure
  • Jaccard score
  • Tanimoto similarity

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. S. Böcker and K. Dührkop. Fragmentation trees reloaded. Journal of Cheminformatics, 8(1):5, 2016. Google Scholar
  2. S. Cappadona, P. R. Baker, P. R. Cutillas, A. JR. Heck, and B. van Breukelen. Current challenges in software solutions for mass spectrometry-based quantitative proteomics. Amino acids, 43(3):1087-1108, 2012. Google Scholar
  3. I. Eidhammer, K. Flikka, L. Martens, and S-O. Mikalsen. Computational methods for mass spectrometry proteomics. John Wiley &Sons, 2008. Google Scholar
  4. F. Lermyte et al. Conformational Space and Stability of ETD Charge Reduction Products of Ubiquitin. J. Am. Soc. Mass Spectrom., Aug 2016. Google Scholar
  5. H. Hisayuki et al. Massbank: a public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry, 45(7):703-714, 2010. Google Scholar
  6. K. Xiao et al. Accurate and Efficient Resolution of Overlapping Isotopic Envelopes in Protein Tandem Mass Spectra. Sci Rep, 5:14755, Oct 2015. Google Scholar
  7. M.M Koek et al. Quantitative metabolomics based on gas chromatography mass spectrometry: status and perspectives. Metabolomics, 7(3):307-328, Sep 2011. Google Scholar
  8. N. Jaitly et al. Decon2ls: An open-source software package for automated processing and visualization of high resolution mass spectrometry data. BMC Bioinformatics, 10(1):87, 2009. Google Scholar
  9. J. Gondzio. Interior point methods 25 years later. European Journal of Operational Research, 218(3):587-601, 2012. Google Scholar
  10. M. E. Hansen and J. Smedsgaard. A new matching algorithm for high resolution mass spectra. J. of Am. Soc. Mass Spectrom., 15(8):1173-1180, 2004. Google Scholar
  11. C. E. Housecroft and E. C. Constable. Chemistry: An introduction to organic, inorganic and physical chemistry. Pearson education, 2010. Google Scholar
  12. Jedrzej Jablonski and Anna Marciniak-Czochra. Efficient algorithms computing distances between radon measures on r. arXiv preprint arXiv:1304.3501, 2013. Google Scholar
  13. Q. Kou, L. Xun, and X. Liu. Toppic: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics, 32(22):3495-3497, 2016. Google Scholar
  14. M. Mann, C. K. Meng, and J. B. Fenn. Interpreting mass spectra of multiply charged ions. Analytical Chemistry, 61(15):1702-1708, 1989. Google Scholar
  15. J. Meija and J.A. Caruso. Deconvolution of isobaric interferences in mass spectra. J. of Am. Soc. Mass Spectrom., 15(5):654-658, 2004. Google Scholar
  16. S. Neumann and S. Böcker. Computational mass spectrometry for metabolomics: identification of metabolites and small molecules. Analytical and Bioanalytical Chemistry, 398(7-8):2779-2788, 2010. Google Scholar
  17. Nina Nikolova and Joanna Jaworska. Approaches to measure chemical similarity-a review. Molecular Informatics, 22(9-10):1006-1026, 2003. Google Scholar
  18. B. B. Reinhold and V. N. Reinhold. Electrospray ionization mass spectrometry: Deconvolution by an entropy-based algorithm. J. of Am. Soc. Mass Spectrom., 3(3):207-215, 1992. Google Scholar
  19. David Rogers and Mathew Hahn. Extended-connectivity fingerprints. Journal of Chemical Information and Modeling, 50(5):742-754, 2010. Google Scholar
  20. Y. Rubner, C. Tomasi, and L.J. Guibas. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99-121, Nov 2000. Google Scholar
  21. F. Santambrogio. Optimal transport for applied mathematicians. Springer, 2015. Google Scholar
  22. Michael W. Senko, Steven C. Beu, and Fred W. McLafferty. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. Journal of the American Society for Mass Spectrometry, 6(4):229-233, 1995. Google Scholar
  23. R. J. Vanderbei et al. Linear programming. Springer, 2015. Google Scholar
  24. K. X. Wan, I. Vidavsky, and M. L. Gross. Comparing similar spectra: from similarity index to spectral contrast angle. J. of Am. Soc. Mass Spectrom., 13(1):85-88, 2002. Google Scholar
  25. Ş. Yilmaz, E. Vandermarliere, and L. Martens. Methods to Calculate Spectrum Similarity, pages 75-100. Springer New York, New York, NY, 2017. Google Scholar
  26. M. K. Łącki et al. IsoSpec: Hyperfast Fine Structure Calculator. Anal. Chem., 89(6):3272-3277, Mar 2017. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail