l1-Penalised Ordinal Polytomous Regression Estimators with Application to Gene Expression Studies

l1-Penalised Ordinal Polytomous Regression Estimators with Application to Gene Expression Studies Qualitative but ordered random variables, such as severity of a pathology, are of paramount importance in biostatistics and medicine. Understanding the conditional distribution of such qualitative variables as a function of other explanatory variables can be performed using a specific regression model known as ordinal polytomous regression. Variable selection in the ordinal polytomous regression model is a computationally difficult combinatorial optimisation problem which is however crucial when practitioners need to understand which covariates are physically related to the output and which covariates are not. One easy way to circumvent the computational hardness of variable selection is to introduce a penalised maximum likelihood estimator based on some well chosen non-smooth penalisation function such as, e.g., the l_1-norm. In the case of the Gaussian linear model, the l_1-penalised least-squares estimator, also known as LASSO estimator, has attracted a lot of attention in the last decade, both from the theoretical and algorithmic viewpoints. However, even in the Gaussian linear model, accurate calibration of the relaxation parameter, i.e., the relative weight of the penalisation term in the estimation cost function is still considered a difficult problem that has to be addressed with caution. In the present paper, we apply l_1-penalisation to the ordinal polytomous regression model and compare several hyper-parameter calibration strategies. Our main contributions are: (a) a useful and simple l_1 penalised estimator for ordinal polytomous regression and a thorough description of how to apply Nesterov's accelerated gradient and the online Frank-Wolfe methods to the problem of computing this estimator, (b) a new hyper-parameter calibration method for the proposed model, based on the QUT idea of Giacobino et al. and (c) a code which can be freely used that implements the proposed estimation procedure. LASSO ordinal polytomous regression Quantile Universal Threshold Frank-Wolfe algorithm Nesterov algorithm Mathematics of computing~Regression analysis 17:1-17:13 Regular Paper Computations have been performed on the supercomputer facilities of the Mésocentre de calcul de Franche-Comté. Stéphane Chrétien Stéphane Chrétien National Physical Laboratory, Hampton Road, Teddington, United Kingdom Christophe Guyeux Christophe Guyeux Computer Science Department, FEMTO-ST Institute, UMR 6174 CNRS, Université de Bourgogne Franche-Comté, 16 route de Gray, 25030 Besançon, France Serge Moulin Serge Moulin Computer Science Department, FEMTO-ST Institute, UMR 6174 CNRS, Université de Bourgogne Franche-Comté, 16 route de Gray, 25030 Besançon, France 10.4230/LIPIcs.WABI.2018.17 Our python module. Accessed: 2018-05-11. URL: https://github.com/SergeMOULIN/l1-penalised-ordinal-polytomous-regression-estimators. https://github.com/SergeMOULIN/l1-penalised-ordinal-polytomous-regression-estimators Hirotogu Akaike. Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pages 199-213. Springer, 1998. Sylvain Arlot, Alain Celisse, et al. A survey of cross-validation procedures for model selection. Statistics surveys, 4:40-79, 2010. Stephen Becker, Jérôme Bobin, and Emmanuel J Candès. Nesta: A fast and accurate first-order method for sparse recovery. SIAM Journal on Imaging Sciences, 4(1):1-39, 2011. Alexandre Belloni, Victor Chernozhukov, and Lie Wang. Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4):791-806, 2011. Peter J Bickel, Ya’acov Ritov, Alexandre B Tsybakov, et al. Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics, 37(4):1705-1732, 2009. Emmanuel Candes, Terence Tao, et al. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35(6):2313-2351, 2007. Emmanuel J Candès, Yaniv Plan, et al. Near-ideal model selection by 𝓁1 minimization. The Annals of Statistics, 37(5A):2145-2177, 2009. Stephane Chretien, Guyeux Christophe, and Serge Moulin. l1-penalised ordinal polytomous regression estimators. arXiv preprint to be submitted, 2018. Stéphane Chrétien and Sébastien Darses. Sparse recovery with unknown variance: a lasso-type approach. IEEE Transactions on Information Theory, 60(7):3970-3988, 2014. Stephane Chretien, Alex Gibberd, and Sandipan Roy. Hedging hyperparameter selection for basis pursuit. arXiv preprint arXiv:1805.01870, 2018. Stephane Chretien, Christophe Guyeux, Michael Boyer-Guittaut, Regis Delage-Mouroux, and Francoise Descotes. Investigating gene expression array with outliers and missing data in bladder cancer. In Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on, pages 994-998. IEEE, 2015. Stéphane Chrétien, Christophe Guyeux, Michael Boyer-Guittaut, Régis Delage-Mouroux, and Françoise Descôtes. Using the lasso for gene selection in bladder cancer data. arXiv preprint arXiv:1504.05004, 2015. Stéphane Chrétien, Christophe Guyeux, Bastien Conesa, Régis Delage-Mouroux, Michèle Jouvenot, Philippe Huetz, and Françoise Descôtes. A bregman-proximal point algorithm for robust non-negative matrix factorization with possible missing values and outliers-application to gene expression analysis. BMC bioinformatics, 17(8):284, 2016. Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval Research Logistics (NRL), 3(1-2):95-110, 1956. Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119-139, 1997. Caroline Giacobino, Sylvain Sardy, Jairo Diaz-Rodriguez, and Nick Hengartner. Quantile universal threshold for model selection. arXiv preprint arXiv:1511.05433, 2015. Christopher Kennedy and Rachel Ward. Greedy variance estimation for the lasso. arXiv preprint arXiv:1803.10878, 2018. Alan Miller. Subset selection in regression. CRC Press, 2002. Yu Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 103(1):127-152, 2005. Yurii Nesterov. A method of solving a convex programming problem with convergence rate o (1/k2). Soviet Mathematics Doklady, pages 372-376, 1983. Gideon Schwarz et al. Estimating the dimension of a model. The annals of statistics, 6(2):461-464, 1978. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267-288, 1994. Robert Tibshirani. The lasso method for variable selection in the cox model. Statistics in medicine, 16(4):385-395, 1997. Sara A Van de Geer et al. High-dimensional generalized linear models and the lasso. The Annals of Statistics, 36(2):614-645, 2008. Stéphane Chrétien, Christophe Guyeux, and Serge Moulin Creative Commons Attribution 3.0 Unported license https://creativecommons.org/licenses/by/3.0/legalcode