Assessing the Significance of Peptide Spectrum Match Scores

Authors Anastasiia Abramova, Anton Korobeynikov

Thumbnail PDF


  • Filesize: 488 kB
  • 11 pages

Document Identifiers

Author Details

Anastasiia Abramova
Anton Korobeynikov

Cite AsGet BibTex

Anastasiia Abramova and Anton Korobeynikov. Assessing the Significance of Peptide Spectrum Match Scores. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 14:1-14:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)


Peptidic Natural Products (PNPs) are highly sought after bioactive compounds that include many antibiotic, antiviral and antitumor agents, immunosuppressors and toxins. Even though recent advancements in mass-spectrometry have led to the development of accurate sequencing methods for nonlinear (cyclic and branch-cyclic) peptides, requiring only picograms of input material, the identification of PNPs via a database search of mass spectra remains problematic. This holds particularly true when trying to evaluate the statistical significance of Peptide Spectrum Matches (PSM) especially when working with non-linear peptides that often contain non-standard amino acids, modifications and have an overall complex structure. In this paper we describe a new way of estimating the statistical significance of a PSM, defined by any peptide (including linear and non-linear), by using state-of-the-art Markov Chain Monte Carlo methods. In addition to the estimate itself our method also provides an uncertainty estimate in the form of confidence bounds, as well as an automatic simulation stopping rule that ensures that the sample size is sufficient to achieve the desired level of result accuracy.
  • mass spectrometry
  • natural products
  • peptide spectrum matches
  • statistical significance


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. J. E. Elias and S. P. Gygi. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods, 4(3):207-214, 2007. Google Scholar
  2. David Fenyö and Ronald C. Beavis. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Analytical Chemistry, 75(4):768-774, 2003. Google Scholar
  3. James M. Flegal and Lei Gong. Relative fixed-width stopping rules for Markov Chain Monte Carlo simulations. Statistica Sinica, 25(2):655-675, 2015. Google Scholar
  4. James M. Flegal and Galin L. Jones. Batch means and spectral variance estimators in Markov Chain Monte Carlo. Ann. Statist., 38(2):1034-1070, 2010. Google Scholar
  5. A. M. Frank. Predicting intensity ranks of peptide fragment ions. J. Proteome Res., 8(5):2226-2240, 2009. Google Scholar
  6. Lei Gong and James M. Flegal. A practical sequential stopping rule for high-dimensional Markov Chain Monte Carlo. Journal of Computational and Graphical Statistics, 25(3):684-700, 2016. Google Scholar
  7. W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97-109, 1970. Google Scholar
  8. James P. Hobert, Galin L. Jones, Brett Presnell, and Jeffrey S. Rosenthal. On the applicability of regenerative simulation in markov chain monte carlo. Biometrika, 89(4):731, 2002. Google Scholar
  9. Yukito Iba, Nen Saito, and Akimasa Kitajima. Multicanonical MCMC for sampling rare events: an illustrative review. Annals of the Institute of Statistical Mathematics, 66(3):611-645, 2014. Google Scholar
  10. Galin L. Jones, Murali Haran, Brian S. Caffo, and Ronald Neath. Fixed-width output analysis for Markov Chain Monte Carlo. Journal of the American Statistical Association, 101(476):1537-1547, 2006. Google Scholar
  11. Sangtae Kim, Nitin Gupta, and Pavel A. Pevzner. Spectral probabilities and generating functions of tandem mass spectra: A strike against dgegeecoy databases. Journal of Proteome Research, 7(8):3354-3363, 2008. Google Scholar
  12. Sangtae Kim, Nikolai Mischerikow, Nuno Bandeira, J. Daniel Navarro, Louis Wich, Shabaz Mohammed, Albert J. R. Heck, and Pavel A. Pevzner. The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: Applications to database search. Molecular &Cellular Proteomics, 9(12):2840-2852, 2010. Google Scholar
  13. D. P. Landau, Shan-Ho Tsai, and M. Exler. A new approach to Monte Carlo simulations in statistical physics: Wang-landau sampling. American Journal of Physics, 72(10):1294-1302, 2004. Google Scholar
  14. H. Mohimani, A. Gurevich, A. Mikheenko, N. Garg, L. F. Nothias, A. Ninomiya, K. Takada, P. C. Dorrestein, and P. A. Pevzner. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol., 13(1):30-37, 2017. Google Scholar
  15. H. Mohimani, S. Kim, and P. A. Pevzner. A new approach to evaluating statistical significance of spectral identifications. J. Proteome Res., 12(4):1560-1568, 2013. Google Scholar
  16. Per Mykland, Luke Tierney, and Bin Yu. Regeneration in Markov chain samplers. Journal of the American Statistical Association, 90(429):233-241, 1995. Google Scholar
  17. G. O. Roberts. Markov chain concepts related to sampling algorithms. In W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, editors, Markov Chain Monte Carlo in Practice, pages 45-58. Chapman &Hall, London, 1996. Google Scholar
  18. Luke Tierney. Markov chains for exploring posterior distributions. Ann. Statist., 22(4):1701-1728, 1994. Google Scholar
  19. F. Wang and D. P. Landau. Efficient, multiple-range random walk algorithm to calculate the density of states. Physical Review Letters, 86:2050-2053, 2001. Google Scholar
  20. M. Wang, J. J. Carver, V. V. Phelan, L. M. Sanchez, et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol., 34(8):828-837, 2016. Google Scholar
  21. Stefan Wolfsheimer, Inke Herms, Sven Rahmann, and Alexander K. Hartmann. Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling. BMC Bioinformatics, 12(1):47, 2011. Google Scholar
  22. Wei Biao Wu. Recursive estimation of time-average variance constants. Ann. Appl. Probab., 19(4):1529-1552, 2009. Google Scholar
  23. Chun Yip Yau and Kin Wai Chan. New recursive estimators of the time-average variance constant. Statistics and Computing, 26(3):609-627, 2016. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail