Discriminative and Generative Models for Clinical Risk Estimation: An Empirical Comparison

Stamford, John; Kambhampati, Chandra

doi:10.4230/OASIcs.ICCSW.2017.5

File

Subject Classification

Keywords

Discriminative
Generative
Naïve Bayes
Logistic Regression
Clinical Risk

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

Abstract

Linear discriminative models, in the form of Logistic Regression, are a popular choice within the clinical domain in the development of risk models. Logistic regression is commonly used as it offers explanatory information in addition to its predictive capabilities. In some examples the coefficients from these models have been used to determine overly simplified clinical risk scores. Such models are constrained to modeling linear relationships between the variables and the class despite it known that this relationship is not always linear. This paper compares the conditions under which linear discriminative and linear generative models perform best. This is done through comparing logistic regression and naïve Bayes on real clinical data. The work shows that generative models perform best when the internal representation of the data is closer to the true distribution of the data and when there is a very small difference between the means of the classes. When looking at variables such as sodium it is shown that logistic regression can not model the observed risk as it is non-linear in its nature, whereas naïve Bayes gives a better estimation of risk. The work concludes that the risk estimations derived from discriminative models such as logistic regression need to be considered in the wider context of the true risk observed within the dataset.

Cite As Get BibTex

John Stamford and Chandra Kambhampati. Discriminative and Generative Models for Clinical Risk Estimation: An Empirical Comparison. In 2017 Imperial College Computing Student Workshop (ICCSW 2017). Open Access Series in Informatics (OASIcs), Volume 60, pp. 5:1-5:9, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2018) https://doi.org/10.4230/OASIcs.ICCSW.2017.5

Author Details

John Stamford

Chandra Kambhampati

References

Roohallah Alizadehsani, Mohammad Javad Hosseini, Zahra Alizadeh Sani, Asma Ghandeharioun, and Reihane Boghrati. Diagnosis of coronary artery disease using cost-sensitive algorithms. In Jilles Vreeken, Charles Ling, Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, editors, 12th IEEE International Conference on Data Mining Workshops, ICDM Workshops, Brussels, Belgium, December 10, 2012, pages 9-16. IEEE Computer Society, 2012. URL: http://dx.doi.org/10.1109/ICDMW.2012.29.
Ruben Amarasingham, Billy J Moore, Ying P Tabak, Mark H Drazner, Christopher A Clark, Song Zhang, W Gary Reed, Timothy S Swanson, Ying Ma, and Ethan A Halm. An Automated Model to Identify Heart Failure Patients at Risk for 30-Day Readmission or Death Using Electronic Medical Record Data. Medical Care, 48(11):981, 2010. URL: http://dx.doi.org/10.1097/mlr.0b013e3181ef60d9.
Jan Bohacik, Chandrasekhar Kambhampati, Darryl N. Davis, and John G. F. Cleland. Prediction of mortality rates in heart failure patients with data mining methods. Annales UMCS, Informatica, 13(2):7-16, 2013. URL: http://dx.doi.org/10.2478/v10065-012-0046-7.
Guillaume Bouchard and Bill Triggs. The Tradeoff Between Generative and Discriminative Classifier. In 16th IASC International Symposium on Computational Statistics, pages 721-728, Prague, Czech Republic, 2004.
Olivier . Chapelle, Bernhard. Schölkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, London, 2006. URL: http://dx.doi.org/10.1007/s12539-009-0016-2.
Nitesh V. Chawla, Nathalie Japkowicz, and Aleksander Kotcz. Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations, 6(1):1-6, 2004. URL: http://dx.doi.org/10.1145/1007730.1007733.
Charles Elkan. Maximum Likelihood , Logistic Regression , and Stochastic Gradient Training. Tutorial notes at CIKM, page 11, 2012. URL: http://www.ats.ucla.edu/stat/stata/dae/mlogit.htm.
G. Michael Felker, Jeffrey D. Leimberger, Robert M. Califf, Michael S. Cuffe, Barry M. Massie, Kirkwood F. Adams, Mihai Gheorghiade, and Christopher M. O'Connor. Risk stratification after hospitalization for decompensated heart failure. Journal of Cardiac Failure, 10(6):460-466, 2004. URL: http://dx.doi.org/10.1016/j.cardfail.2004.02.011.
Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. MIMIC-III, a freely accessible critical care database. Scientific Data, 3:160035, 2016. URL: http://dx.doi.org/10.1038/sdata.2016.35.
Wayne C. Levy, Dariush Mozaffarian, David T. Linker, Santosh C. Sutradhar, Stefan D. Anker, Anne B. Cropp, Inder Anand, Aldo Maggioni, Paul Burton, Mark D. Sullivan, Bertram Pitt, Philip A. Poole-Wilson, Douglas L. Mann, and Milton Packer. The Seattle Heart Failure Model: Prediction of survival in heart failure. Circulation, 113(11):1424-1433, 2006. URL: http://dx.doi.org/10.1161/CIRCULATIONAHA.105.584102.
Lisa Moore. Data Mining for Heart Failure: An investigation into the challenges un real life clinical datasets. PhD thesis, University of Hull, 2015. URL: http://dx.doi.org/10.1017/CBO9781107415324.004.
Lisa Moore, Chandra Kambhampati, and John G. F. Cleland. Classification of a real live heart failure clinical dataset- is TAN bayes better than other bayes? In 2014 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2014, San Diego, CA, USA, October 5-8, 2014, pages 882-887. IEEE, 2014. URL: https://doi.org/10.1109/SMC.2014.6974023, URL: http://dx.doi.org/10.1109/SMC.2014.6974023.
NICE. Chronic heart failure - Management of chronic heart failure in adults in primary and secondary care, 2010.
Sellappan Palaniappan and Rafiah Awang. Intelligent heart disease prediction system using data mining techniques. In The 6th ACS/IEEE International Conference on Computer Systems and Applications, AICCSA 2008, Doha, Qatar, March 31 - April 4, 2008, pages 108-115. IEEE Computer Society, 2008. URL: http://dx.doi.org/10.1109/AICCSA.2008.4493524.
Stuart J. Pocock, Cono A. Ariti, John J V McMurray, Aldo Maggioni, Lars Køber, Iain B. Squire, Karl Swedberg, Joanna Dobson, Katrina K. Poppe, Gillian a. Whalley, and Rob N. Doughty. Predicting survival in heart failure: A risk score based on 39 372 patients from 30 studies. European Heart Journal, 34(19):1404-1413, 2013. URL: http://dx.doi.org/10.1093/eurheartj/ehs337.
Ewout W. Steyerberg. Clinical Prediction Models. Springer, 2009.
Vladimir Vapnik. Statistical learning theory. Wiley, 1998.
J. J. G. De Vries, Gijs Geleijnse, Aleksandra Tesanovic, and Ramon van de Ven. Heart failure risk models and their readiness for clinical practice. In IEEE International Conference on Healthcare Informatics, ICHI 2013, 9-11 September, 2013, Philadelphia, PA, USA, pages 239-247. IEEE Computer Society, 2013. URL: http://dx.doi.org/10.1109/ICHI.2013.26.
P W Wilson, R B D'Agostino, Daniel Levy, Albert M Belanger, Halit Silbershatz, and William B Kannel. Prediction of coronary heart disease using risk factor categories. Circulation, 97(18):1837-1847, 1998. URL: http://dx.doi.org/10.1161/01.CIR.97.18.1837.
Jing-Hao Xue and D. M. Titterington. Comment on "on discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes". Neural Processing Letters, 28(3):169-187, 2008. URL: http://dx.doi.org/10.1007/s11063-008-9088-7.
Jing-Hao Xue and D. M. Titterington. Comment on "on discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes". Neural Processing Letters, 28(3):169-187, 2008. URL: http://dx.doi.org/10.1007/s11063-008-9088-7.

Discriminative and Generative Models for Clinical Risk Estimation: An Empirical Comparison

Authors John Stamford, Chandra Kambhampati

File

Document Identifiers

Subject Classification

Keywords

Metrics

Abstract

Cite As Get BibTex

Author Details

References

Thanks for your feedback!

Could not send message