Assessing the Significance of Peptide Spectrum Match Scores

Abramova, Anastasiia; Korobeynikov, Anton

doi:10.4230/LIPIcs.WABI.2017.14

File

Cite AsGet BibTex

Anastasiia Abramova and Anton Korobeynikov. Assessing the Significance of Peptide Spectrum Match Scores. In 17th International Workshop on Algorithms in Bioinformatics (WABI 2017). Leibniz International Proceedings in Informatics (LIPIcs), Volume 88, pp. 14:1-14:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LIPIcs.WABI.2017.14

Abstract

Peptidic Natural Products (PNPs) are highly sought after bioactive compounds that include many antibiotic, antiviral and antitumor agents, immunosuppressors and toxins. Even though recent advancements in mass-spectrometry have led to the development of accurate sequencing methods for nonlinear (cyclic and branch-cyclic) peptides, requiring only picograms of input material, the identification of PNPs via a database search of mass spectra remains problematic. This holds particularly true when trying to evaluate the statistical significance of Peptide Spectrum Matches (PSM) especially when working with non-linear peptides that often contain non-standard amino acids, modifications and have an overall complex structure. In this paper we describe a new way of estimating the statistical significance of a PSM, defined by any peptide (including linear and non-linear), by using state-of-the-art Markov Chain Monte Carlo methods. In addition to the estimate itself our method also provides an uncertainty estimate in the form of confidence bounds, as well as an automatic simulation stopping rule that ensures that the sample size is sufficient to achieve the desired level of result accuracy.

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

J. E. Elias and S. P. Gygi. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods, 4(3):207-214, 2007.
David Fenyö and Ronald C. Beavis. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Analytical Chemistry, 75(4):768-774, 2003.
James M. Flegal and Lei Gong. Relative fixed-width stopping rules for Markov Chain Monte Carlo simulations. Statistica Sinica, 25(2):655-675, 2015.
James M. Flegal and Galin L. Jones. Batch means and spectral variance estimators in Markov Chain Monte Carlo. Ann. Statist., 38(2):1034-1070, 2010.
A. M. Frank. Predicting intensity ranks of peptide fragment ions. J. Proteome Res., 8(5):2226-2240, 2009.
Lei Gong and James M. Flegal. A practical sequential stopping rule for high-dimensional Markov Chain Monte Carlo. Journal of Computational and Graphical Statistics, 25(3):684-700, 2016.
W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97-109, 1970.
James P. Hobert, Galin L. Jones, Brett Presnell, and Jeffrey S. Rosenthal. On the applicability of regenerative simulation in markov chain monte carlo. Biometrika, 89(4):731, 2002.
Yukito Iba, Nen Saito, and Akimasa Kitajima. Multicanonical MCMC for sampling rare events: an illustrative review. Annals of the Institute of Statistical Mathematics, 66(3):611-645, 2014.
Galin L. Jones, Murali Haran, Brian S. Caffo, and Ronald Neath. Fixed-width output analysis for Markov Chain Monte Carlo. Journal of the American Statistical Association, 101(476):1537-1547, 2006.
Sangtae Kim, Nitin Gupta, and Pavel A. Pevzner. Spectral probabilities and generating functions of tandem mass spectra: A strike against dgegeecoy databases. Journal of Proteome Research, 7(8):3354-3363, 2008.
Sangtae Kim, Nikolai Mischerikow, Nuno Bandeira, J. Daniel Navarro, Louis Wich, Shabaz Mohammed, Albert J. R. Heck, and Pavel A. Pevzner. The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: Applications to database search. Molecular &Cellular Proteomics, 9(12):2840-2852, 2010.
D. P. Landau, Shan-Ho Tsai, and M. Exler. A new approach to Monte Carlo simulations in statistical physics: Wang-landau sampling. American Journal of Physics, 72(10):1294-1302, 2004.
H. Mohimani, A. Gurevich, A. Mikheenko, N. Garg, L. F. Nothias, A. Ninomiya, K. Takada, P. C. Dorrestein, and P. A. Pevzner. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol., 13(1):30-37, 2017.
H. Mohimani, S. Kim, and P. A. Pevzner. A new approach to evaluating statistical significance of spectral identifications. J. Proteome Res., 12(4):1560-1568, 2013.
Per Mykland, Luke Tierney, and Bin Yu. Regeneration in Markov chain samplers. Journal of the American Statistical Association, 90(429):233-241, 1995.
G. O. Roberts. Markov chain concepts related to sampling algorithms. In W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, editors, Markov Chain Monte Carlo in Practice, pages 45-58. Chapman &Hall, London, 1996.
Luke Tierney. Markov chains for exploring posterior distributions. Ann. Statist., 22(4):1701-1728, 1994.
F. Wang and D. P. Landau. Efficient, multiple-range random walk algorithm to calculate the density of states. Physical Review Letters, 86:2050-2053, 2001.
M. Wang, J. J. Carver, V. V. Phelan, L. M. Sanchez, et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol., 34(8):828-837, 2016.
Stefan Wolfsheimer, Inke Herms, Sven Rahmann, and Alexander K. Hartmann. Accurate statistics for local sequence alignment with position-dependent scoring by rare-event sampling. BMC Bioinformatics, 12(1):47, 2011.
Wei Biao Wu. Recursive estimation of time-average variance constants. Ann. Appl. Probab., 19(4):1529-1552, 2009.
Chun Yip Yau and Kin Wai Chan. New recursive estimators of the time-average variance constant. Statistics and Computing, 26(3):609-627, 2016.

Assessing the Significance of Peptide Spectrum Match Scores

Authors Anastasiia Abramova, Anton Korobeynikov

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Keywords

Metrics

References

Thanks for your feedback!

Could not send message