Quantified Uncertainty of Flexible Protein-Protein Docking Algorithms

Author Nathan L. Clement

Thumbnail PDF


  • Filesize: 1.28 MB
  • 12 pages

Document Identifiers

Author Details

Nathan L. Clement
  • Department of Computer Science, University of Texas at Austin, USA


I would like to thank all those who have supported and helped advise on this work, for their valuable feedback and suggestions for improvement.

Cite AsGet BibTex

Nathan L. Clement. Quantified Uncertainty of Flexible Protein-Protein Docking Algorithms. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 143, pp. 3:1-3:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


The strength or weakness of an algorithm is ultimately governed by the confidence of its result. When the domain of the problem is large (e.g. traversal of a high-dimensional space), an exact solution often cannot be obtained, so approximations must be made. These approximations often lead to a reported quantity of interest (QOI) which varies between runs, decreasing the confidence of any single run. When the algorithm further computes this QOI based on uncertain or noisy data, the variability (or lack of confidence) of the QOI increases. Unbounded, these two sources of uncertainty (algorithmic approximations and uncertainty in input data) can result in a reported statistic that has low correlation with ground truth. In molecular biology applications, this is especially applicable, as the search space is generally large and observations are often noisy. This research applies uncertainty quantification techniques to the difficult protein-protein docking problem, where uncertainties arise from the explicit conversion from continuous to discrete space for protein representation (introducing some uncertainty in the input data), as well as discrete sampling of the conformations. It describes the variability that exists in existing software, and then provides a method for computing probabilistic certificates in the form of Chernoff-like bounds. Finally, this paper leverages these probabilistic certificates to accurately bound the uncertainty in docking from two docking algorithms, providing a QOI that is both robust and statistically meaningful.

Subject Classification

ACM Subject Classification
  • Applied computing → Molecular structural biology
  • Mathematics of computing → Hypothesis testing and confidence interval computation
  • Computing methodologies → Uncertainty quantification
  • protein-protein docking
  • uncertainty quantification
  • protein flexibility
  • low-discrepancy sampling
  • high-dimensional sampling


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. C. Bajaj, R. Chowdhury, and V. Siddahanavalli. F2Dock: Fast Fourier Protein-protein Docking. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 8(1):45-58, 2011. Google Scholar
  2. Chandrajit Bajaj, Rezaul Alam Chowdhury, and Vinay Siddavanahalli. F3Dock: A fast, flexible and Fourier based approach to protein-protein docking. The University of Texas at Austin, ICES Report, pages 08-01, 2008. Google Scholar
  3. Chandrajit L. Bajaj, Abhishek Bhowmick, Eshan Chattopadhyay, and David Zuckerman. On Low Discrepancy Samplings in Product Spaces of Motion Groups. arXiv e-prints, page arXiv:1411.7753, November 2014. URL: http://arxiv.org/abs/1411.7753.
  4. Ian David Brown. Recent developments in the methods and applications of the bond valence model. Chemical reviews, 109(12):6858-6919, 2009. Google Scholar
  5. Yue Cao and Yang Shen. Bayesian active learning for optimization and uncertainty quantification in protein docking. arXiv preprint, 2019. URL: http://arxiv.org/abs/1902.00067.
  6. D.A. Case, R.M. Betz, D.S. Cerutti, et al. AMBER 2016. University of California, San Francisco, 2016. Google Scholar
  7. Sidhartha Chaudhury, Monica Berrondo, Brian D Weitzner, Pravin Muthu, Hannah Bergman, and Jeffrey J Gray. Benchmarking and analysis of protein docking performance in Rosetta v3. 2. PloS One, 6(8):e22477, 2011. Google Scholar
  8. Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4):493-507, 1952. Google Scholar
  9. R. Chowdhury, D. Keidel, M. Moussalem, M. Rasheed, A. Olson, M. Sanner, and C. Bajaj. Protein-Protein Docking with F²Dock 2.0 and GB-rerank. Biophys. J., 8(3):1-19, 2013. Google Scholar
  10. Nathan Clement, Muhibur Rasheed, and Chandrajit Lal Bajaj. Viral capsid assembly: A quantified uncertainty approach. Journal of Computational Biology, 25(1):51-71, 2018. Google Scholar
  11. Ugur Emekli, Dina Schneidman-Duhovny, Haim J Wolfson, Ruth Nussinov, and Turkan Haliloglu. HingeProt: automated prediction of hinges in protein structures. Proteins: Structure, Function, and Bioinformatics, 70(4):1219-1227, 2008. Google Scholar
  12. Vamshi K. Gangupomu, Jeffrey R. Wagner, In-Hee Park, Abhinandan Jain, and Nagarajan Vaidehi. Mapping Conformational Dynamics of Proteins Using Torsional Dynamics Simulations. Biophysical Journal, 104(9):1999-2008, 2013. Google Scholar
  13. Bryant Gipson, David Hsu, Lydia E Kavraki, and Jean-Claude Latombe. Computational models of protein kinematics and dynamics: Beyond simulation. Annual Review of Analytical Chemistry, 5:273-291, 2012. Google Scholar
  14. Barry J Grant, Ana PC Rodrigues, Karim M ElSawy, J Andrew McCammon, and Leo SD Caves. Bio3d: an R package for the comparative analysis of protein structures. Bioinformatics, 22(21):2695-2696, 2006. Google Scholar
  15. Jeffrey Gray, Stewart Moughon, Chu Wang, Ora Schueler-Furman, Brian Kuhlman, Carol Rohl, and David Baker. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. Journal of Molecular Biology, 331(1):281-299, 2003. Google Scholar
  16. Fred James, Jiri Hoogland, and Ronald Kleiss. Quasi-Monte Carlo, discrepancies and error estimates. Methods, page 9, 1996. URL: http://arxiv.org/abs/physics/9611010.
  17. Georgii G Krivov, Maxim V Shapovalov, and Roland L Dunbrack. Improved prediction of protein side-chain conformations with SCWRL4. Proteins: Structure, Function, and Bioinformatics, 77(4):778-795, 2009. Google Scholar
  18. Daisuke Kuroda and Jeffrey J Gray. Pushing the backbone in protein-protein docking. Structure, 24(10):1821-1829, 2016. Google Scholar
  19. Loren M LaPointe, Keenan C Taylor, Sabareesh Subramaniam, Ambalika Khadria, Ivan Rayment, and Alessandro Senes. Structural organization of FtsB, a transmembrane protein of the bacterial divisome. Biochemistry, 52(15):2574-2585, 2013. Google Scholar
  20. Xiaofan Li, Iain H Moal, and Paul A Bates. Detection and refinement of encounter complexes for protein-protein docking: taking account of macromolecular crowding. Proteins: Structure, Function, and Bioinformatics, 78(15):3189-3196, 2010. Google Scholar
  21. Kanti Mardia, Charles Taylor, and Ganesh Subramaniam. Protein bioinformatics and mixtures of bivariate von Mises distributions for angular data. Biometrics, 63(2):505-512, 2007. Google Scholar
  22. C McDiarmid. On the method of bounded differences. Surveys in Combinatorics, 141(141):148-188, 1989. Google Scholar
  23. R. J. Milgram, G. Liu, and J. C. Latombe. On the structure of the inverse kinematics map of a fragment of protein backbone. Journal of Computational Chemistry, 29(1):50-68, 2008. Google Scholar
  24. Harald Niederreiter. Quasi-Monte Carlo methods. Encyclopedia of Quantitative Finance, 24(1):55-61, 1990. Google Scholar
  25. Dzmitry Padhorny, Andrey Kazennov, Brandon S Zerbe, Kathryn A Porter, Bing Xia, Scott E Mottarella, Yaroslav Kholodov, David W Ritchie, Sandor Vajda, and Dima Kozakov. Protein-protein docking by fast generalized Fourier transforms on 5D rotational manifolds. Proceedings of the National Academy of Sciences, 113(30):E4286-E4293, 2016. Google Scholar
  26. Muhibur Rasheed, Radhakrishna Bettadapura, and Chandrajit Bajaj. Computational Refinement and Validation Protocol for Proteins with Large Variable Regions Applied to Model HIV Env Spike in CD4 and 17b Bound State. Structure, 23(6):1138-1149, 2015. Google Scholar
  27. Muhibur Rasheed, Nathan Clement, Abhishek Bhowmick, and Chandrajit L Bajaj. Statistical framework for uncertainty quantification in computational molecular modeling. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017. Google Scholar
  28. Thom Vreven, Iain H Moal, Anna Vangone, Brian G Pierce, et al. Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. Journal of Molecular Biology, 427(19):3031-3041, 2015. Google Scholar
  29. Guoli Wang and Roland L Dunbrack Jr. PISCES: a protein sequence culling server. Bioinformatics, 19(12):1589-1591, 2003. Google Scholar
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail