l1-Penalised Ordinal Polytomous Regression Estimators with Application to Gene Expression Studies

eng Schloss Dagstuhl – Leibniz-Zentrum für Informatik Leibniz International Proceedings in Informatics 1868-8969 2018-08-02 17:1 17:13 10.4230/LIPIcs.WABI.2018.17 article l1-Penalised Ordinal Polytomous Regression Estimators with Application to Gene Expression Studies Chrétien, Stéphane 1 Guyeux, Christophe 2 Moulin, Serge 2 National Physical Laboratory, Hampton Road, Teddington, United Kingdom Computer Science Department, FEMTO-ST Institute, UMR 6174 CNRS, Université de Bourgogne Franche-Comté, 16 route de Gray, 25030 Besançon, France Qualitative but ordered random variables, such as severity of a pathology, are of paramount importance in biostatistics and medicine. Understanding the conditional distribution of such qualitative variables as a function of other explanatory variables can be performed using a specific regression model known as ordinal polytomous regression. Variable selection in the ordinal polytomous regression model is a computationally difficult combinatorial optimisation problem which is however crucial when practitioners need to understand which covariates are physically related to the output and which covariates are not. One easy way to circumvent the computational hardness of variable selection is to introduce a penalised maximum likelihood estimator based on some well chosen non-smooth penalisation function such as, e.g., the l_1-norm. In the case of the Gaussian linear model, the l_1-penalised least-squares estimator, also known as LASSO estimator, has attracted a lot of attention in the last decade, both from the theoretical and algorithmic viewpoints. However, even in the Gaussian linear model, accurate calibration of the relaxation parameter, i.e., the relative weight of the penalisation term in the estimation cost function is still considered a difficult problem that has to be addressed with caution. In the present paper, we apply l_1-penalisation to the ordinal polytomous regression model and compare several hyper-parameter calibration strategies. Our main contributions are: (a) a useful and simple l_1 penalised estimator for ordinal polytomous regression and a thorough description of how to apply Nesterov's accelerated gradient and the online Frank-Wolfe methods to the problem of computing this estimator, (b) a new hyper-parameter calibration method for the proposed model, based on the QUT idea of Giacobino et al. and (c) a code which can be freely used that implements the proposed estimation procedure. https://drops.dagstuhl.de/storage/00lipics/lipics-vol113-wabi2018/LIPIcs.WABI.2018.17/LIPIcs.WABI.2018.17.pdf LASSO ordinal polytomous regression Quantile Universal Threshold Frank-Wolfe algorithm Nesterov algorithm