Discriminative and Generative Models for Clinical Risk Estimation: An Empirical Comparison

Authors John Stamford, Chandra Kambhampati

Thumbnail PDF


  • Filesize: 0.95 MB
  • 9 pages

Document Identifiers

Author Details

John Stamford
Chandra Kambhampati

Cite AsGet BibTex

John Stamford and Chandra Kambhampati. Discriminative and Generative Models for Clinical Risk Estimation: An Empirical Comparison. In 2017 Imperial College Computing Student Workshop (ICCSW 2017). Open Access Series in Informatics (OASIcs), Volume 60, pp. 5:1-5:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018)


Linear discriminative models, in the form of Logistic Regression, are a popular choice within the clinical domain in the development of risk models. Logistic regression is commonly used as it offers explanatory information in addition to its predictive capabilities. In some examples the coefficients from these models have been used to determine overly simplified clinical risk scores. Such models are constrained to modeling linear relationships between the variables and the class despite it known that this relationship is not always linear. This paper compares the conditions under which linear discriminative and linear generative models perform best. This is done through comparing logistic regression and naïve Bayes on real clinical data. The work shows that generative models perform best when the internal representation of the data is closer to the true distribution of the data and when there is a very small difference between the means of the classes. When looking at variables such as sodium it is shown that logistic regression can not model the observed risk as it is non-linear in its nature, whereas naïve Bayes gives a better estimation of risk. The work concludes that the risk estimations derived from discriminative models such as logistic regression need to be considered in the wider context of the true risk observed within the dataset.
  • Discriminative
  • Generative
  • Naïve Bayes
  • Logistic Regression
  • Clinical Risk


  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    PDF Downloads


  1. Roohallah Alizadehsani, Mohammad Javad Hosseini, Zahra Alizadeh Sani, Asma Ghandeharioun, and Reihane Boghrati. Diagnosis of coronary artery disease using cost-sensitive algorithms. In Jilles Vreeken, Charles Ling, Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, editors, 12th IEEE International Conference on Data Mining Workshops, ICDM Workshops, Brussels, Belgium, December 10, 2012, pages 9-16. IEEE Computer Society, 2012. URL: http://dx.doi.org/10.1109/ICDMW.2012.29.
  2. Ruben Amarasingham, Billy J Moore, Ying P Tabak, Mark H Drazner, Christopher A Clark, Song Zhang, W Gary Reed, Timothy S Swanson, Ying Ma, and Ethan A Halm. An Automated Model to Identify Heart Failure Patients at Risk for 30-Day Readmission or Death Using Electronic Medical Record Data. Medical Care, 48(11):981, 2010. URL: http://dx.doi.org/10.1097/mlr.0b013e3181ef60d9.
  3. Jan Bohacik, Chandrasekhar Kambhampati, Darryl N. Davis, and John G. F. Cleland. Prediction of mortality rates in heart failure patients with data mining methods. Annales UMCS, Informatica, 13(2):7-16, 2013. URL: http://dx.doi.org/10.2478/v10065-012-0046-7.
  4. Guillaume Bouchard and Bill Triggs. The Tradeoff Between Generative and Discriminative Classifier. In 16th IASC International Symposium on Computational Statistics, pages 721-728, Prague, Czech Republic, 2004. Google Scholar
  5. Olivier . Chapelle, Bernhard. Schölkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, London, 2006. URL: http://dx.doi.org/10.1007/s12539-009-0016-2.
  6. Nitesh V. Chawla, Nathalie Japkowicz, and Aleksander Kotcz. Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations, 6(1):1-6, 2004. URL: http://dx.doi.org/10.1145/1007730.1007733.
  7. Charles Elkan. Maximum Likelihood , Logistic Regression , and Stochastic Gradient Training. Tutorial notes at CIKM, page 11, 2012. URL: http://www.ats.ucla.edu/stat/stata/dae/mlogit.htm.
  8. G. Michael Felker, Jeffrey D. Leimberger, Robert M. Califf, Michael S. Cuffe, Barry M. Massie, Kirkwood F. Adams, Mihai Gheorghiade, and Christopher M. O'Connor. Risk stratification after hospitalization for decompensated heart failure. Journal of Cardiac Failure, 10(6):460-466, 2004. URL: http://dx.doi.org/10.1016/j.cardfail.2004.02.011.
  9. Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. MIMIC-III, a freely accessible critical care database. Scientific Data, 3:160035, 2016. URL: http://dx.doi.org/10.1038/sdata.2016.35.
  10. Wayne C. Levy, Dariush Mozaffarian, David T. Linker, Santosh C. Sutradhar, Stefan D. Anker, Anne B. Cropp, Inder Anand, Aldo Maggioni, Paul Burton, Mark D. Sullivan, Bertram Pitt, Philip A. Poole-Wilson, Douglas L. Mann, and Milton Packer. The Seattle Heart Failure Model: Prediction of survival in heart failure. Circulation, 113(11):1424-1433, 2006. URL: http://dx.doi.org/10.1161/CIRCULATIONAHA.105.584102.
  11. Lisa Moore. Data Mining for Heart Failure: An investigation into the challenges un real life clinical datasets. PhD thesis, University of Hull, 2015. URL: http://dx.doi.org/10.1017/CBO9781107415324.004.
  12. Lisa Moore, Chandra Kambhampati, and John G. F. Cleland. Classification of a real live heart failure clinical dataset- is TAN bayes better than other bayes? In 2014 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2014, San Diego, CA, USA, October 5-8, 2014, pages 882-887. IEEE, 2014. URL: https://doi.org/10.1109/SMC.2014.6974023, URL: http://dx.doi.org/10.1109/SMC.2014.6974023.
  13. NICE. Chronic heart failure - Management of chronic heart failure in adults in primary and secondary care, 2010. Google Scholar
  14. Sellappan Palaniappan and Rafiah Awang. Intelligent heart disease prediction system using data mining techniques. In The 6th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2008, Doha, Qatar, March 31 - April 4, 2008, pages 108-115. IEEE Computer Society, 2008. URL: http://dx.doi.org/10.1109/AICCSA.2008.4493524.
  15. Stuart J. Pocock, Cono A. Ariti, John J V McMurray, Aldo Maggioni, Lars Køber, Iain B. Squire, Karl Swedberg, Joanna Dobson, Katrina K. Poppe, Gillian a. Whalley, and Rob N. Doughty. Predicting survival in heart failure: A risk score based on 39 372 patients from 30 studies. European Heart Journal, 34(19):1404-1413, 2013. URL: http://dx.doi.org/10.1093/eurheartj/ehs337.
  16. Ewout W. Steyerberg. Clinical Prediction Models. Springer, 2009. Google Scholar
  17. Vladimir Vapnik. Statistical learning theory. Wiley, 1998. Google Scholar
  18. J. J. G. De Vries, Gijs Geleijnse, Aleksandra Tesanovic, and Ramon van de Ven. Heart failure risk models and their readiness for clinical practice. In IEEE International Conference on Healthcare Informatics, ICHI 2013, 9-11 September, 2013, Philadelphia, PA, USA, pages 239-247. IEEE Computer Society, 2013. URL: http://dx.doi.org/10.1109/ICHI.2013.26.
  19. P W Wilson, R B D'Agostino, Daniel Levy, Albert M Belanger, Halit Silbershatz, and William B Kannel. Prediction of coronary heart disease using risk factor categories. Circulation, 97(18):1837-1847, 1998. URL: http://dx.doi.org/10.1161/01.CIR.97.18.1837.
  20. Jing-Hao Xue and D. M. Titterington. Comment on "on discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes". Neural Processing Letters, 28(3):169-187, 2008. URL: http://dx.doi.org/10.1007/s11063-008-9088-7.
  21. Jing-Hao Xue and D. M. Titterington. Comment on "on discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes". Neural Processing Letters, 28(3):169-187, 2008. URL: http://dx.doi.org/10.1007/s11063-008-9088-7.
Questions / Remarks / Feedback

Feedback for Dagstuhl Publishing

Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail