Title: Cluster feature selection in high-dimensional linear models
Abstract: This paper concerns with variable screening when highly correlated variables exist in high-dimensional linear models. We propose a novel cluster feature selection (CFS) procedure based on the elastic net and linear correlation variable screening to enjoy the benefits of the two methods. When calculating the correlation between the predictor and the response, we consider highly correlated groups of predictors instead of the individual ones. This is in contrast to the usual linear correlation variable screening. Within each correlated group, we apply the elastic net to select variables and estimate their parameters. This avoids the drawback of mistakenly eliminating true relevant variables when they are highly correlated like LASSO [R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B 58 (1996) 268–288] does. After applying the CFS procedure, the maximum absolute correlation coefficient between clusters becomes smaller and any common model selection methods like sure independence screening (SIS) [J. Fan and J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B 70 (2008) 849–911] or LASSO can be applied to improve the results. Extensive numerical examples including pure simulation examples and semi-real examples are conducted to show the good performances of our procedure.
Publication Year: 2018
Publication Date: 2018-01-01
Language: en
Type: article
Indexed In: ['crossref']
Access and Citation
Cited By Count: 1
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot