Slide 1

Slide 1 text

1 SHORT COURSE: REVIEW OF VARIABLE SELECTION METHODS JEFF GOLDSMITH, PHD DEPARTMENT OF BIOSTATISTICS

Slide 2

Slide 2 text

2 Outline of short course • Review variable selection methods • Scalar-on-function regression • Function-on-scalar regression • Other models and approaches Throughout, we will emphasize methods and code.

Slide 3

Slide 3 text

3 VS in linear models • X is an n x p matrix • p is large (often >> n) • OLS estimates are highly variable or unidentifiable • Variable selection methods remove unnecessary predictors from the model • Focus on automated approaches, rather than model building • Emphasis is often on prediction accuracy, though post-selection methods for inference exist y = X + ✏

Slide 4

Slide 4 text

4 Methods for VS • Shrinkage penalties • Lasso • SCAD • MCP • Bayesian variable selection • Spike and slab priors • Shrinkage priors • Other methods

Slide 5

Slide 5 text

5 Penalty-based approaches • Minimize wrt • Find solution for fixed • Find best solution across values of , often via cross validation (y X )T (y X ) + p X k=1 p (| k |)

Slide 6

Slide 6 text

6 Available penalties • Most common penalty is Lasso: • Compared to ridge, penalizes magnitude of coefficients and encourages sparsity: p ( ) = | | Figure from ISLR

Slide 7

Slide 7 text

7 Available penalties • Lasso has non-zero derivative, so even large coefficients are shrunk • Recent alternatives have sought to address this • Smoothly clipped absolute deviation penalty (SCAD) • Minimax concave penalty (MCP) • For example, solutions under SCAD penalty are

Slide 8

Slide 8 text

8 Available penalties Figure from “NEARLY UNBIASED VARIABLE SELECTION UNDER MINIMAX CONCAVE PENALTY “

Slide 9

Slide 9 text

9 Bayesian VS • Most common approach is the “Spike and Slab” approach • One narrow prior and one wide prior, with a latent binary indicator separating predictors between the two • Often the spike is a point-mass at zero, but recent work has used a narrow but continuous prior • Recently, shrinkage priors (e.g. Horseshoe prior; Dirichlet-Laplace prior) have become popular • Good computational and (sometimes) theoretical properties

Slide 10

Slide 10 text

10 Emphasizing penalties • For this short course, we’ll emphasize the penalty-based approach • Very popular outside FDA • Very accessible code and packages

Slide 11

Slide 11 text

11 Group VS • The preceding has focused on shrinking individual covariates to zero • We’ll shortly see that shrinking groups of variables to zero is useful for our purposes today • To that end, group variable selection methods are needed: • Group versions of each major penalty exist (y X )T (y X ) + G X g=1 p (|| g ||)

Slide 12

Slide 12 text

12 Composite penalties • Composite penalties will come up in passing, but not be a major emphasis of today’s material • Briefly, multiple penalties can be used in the same minimization problem: (y X )T (y X ) + p X k=1 p1, 1 (| k |) + p X k=1 p2, 2 (| k |)

Slide 13

Slide 13 text

13 Key references • Fan and Li (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. JASA. • Zhang (2010). Nearly Unbiased Variable Selection Under Minimax Concave Penalty. Annals of Statistics. • Rockova and George (2014). EMVS: The EM Approach to Bayesian Variable Selection. JASA.

Slide 14

Slide 14 text

14 Switch to code