Jeff Goldsmith
March 22, 2017
330

# VS in FDA Short Course IV

March 22, 2017

## Transcript

1. ### 1 SHORT COURSE: OTHER TOPICS IN FDA + VARIABLE SELECTION

JEFF GOLDSMITH, PHD DEPARTMENT OF BIOSTATISTICS
2. ### 2 Wide variety of models • Seen specific approaches to

two model classes (SoFR and FoSR) • There are a lot of other ways to encounter the need for variable selection in functional regression • Scalar-on-image regression • Function-on-function regression

4. ### 4 Scalar-on-image regression • Predictor images are functional data •

Issue isn’t large number of functional predictors – it’s the size of the single predictor • Scientific assumptions encourage sparse solutions • Together, both support the use of variable selection approaches
5. ### 5 Scalar-on-image regression • Several possibilities for estimating coefficient image

• Splines + penalties • Spatial Bayesian variable selection • Soft-thresholded Gaussian process • Total variation y = X · + ✏
6. ### 6 Spatial Bayesian approach • Introduce binary coefficient image such

that regression coefficient is zero if and only if binary variable is zero • Promote smoothness and clustering through MRFs • Clustering via an Ising prior: p ( ) = ( a, b ) exp " a · + X l ( X l02 l blI ( l = l0 ) )# • Smoothness via a conditional autoregressive prior for coefficients [ l | l = 1, l, l] ⇠ N[¯ l , 2 b /dl] • Full disclosure – I’m not really a fan of this method anymore.
7. ### 7 Scalar-on-image regression • Few methods have robust, public implementations

• Access to data can also be a challenge • Consortia (e.g. ADNI) are starting to change that • Computational intensity is critical
8. ### 8 Concurrent model • Special case of function-on-function regression •

Comes up more frequently than scalar-on-image • Many longitudinal studies can be thought of in this context • Ease of data collection can make this a variable selection problem • E.g. ambulatory blood pressure monitoring Yi(t) = 0(t) + p X k=1 Xik(t) k(t) + i(t)
9. ### 9 Ambulatory blood pressure • Ambulatory blood pressure possibly depends

on a lot of stuff: • Time of day • Activity intensity • Mood • Location • Posture • Clinical BP • These can be measured using an EMA
10. ### 10 Ambulatory blood pressure Systolic BP Frustration Phys. Activity Working

100 120 140 160 180 0 25 50 75 100 0 10 20 30 0.00 0.25 0.50 0.75 1.00 10:00 16:00 22:00 10:00 16:00 22:00 10:00 16:00 22:00 10:00 16:00 22:00 Time
11. ### 11 Out-of-the box solution? • OLS solutions to spline-based implementations

of the functional linear concurrent model have the following form: X i X⇤T i X⇤ i ! 1 X i X⇤T i yi ! • Hard to directly apply group variable selection tools
12. ### 12 Bayesian variable selection Bk ⇠ N[0, ( kv0 +

(1 k)v1)I] k ⇠ Bern(⇡) • Estimate using Variational Bayes • Reasonably computationally efficient • Can jointly model an FPCA expansion of residual curves as well • Main tuning parameter is v0 • Controls width of spike prior • Too narrow and all groups are omitted; too wide and all groups are included • Not really variable selection, but that has some helpful properties
13. ### 13 Application to ABP • • • • • •

• • • • • • • • • • • • • • • • • • • • • • • • • • 0.00 0.25 0.50 0.75 1.00 Clinical SysBP Clinical DiaBP Age BMI Sex Heart Rate Feeling Excited Feeling Frustrated Feeling Happy Feeling Tired Feeling Angry Feeling Anxious Alcohol Consumption Caffiene Consumption Relaxing Working Doing Chores Commuting Doing Exercise Having a Meal Doing None of These Current Exertion Level Experiencing Pain Posture: Sitting Posture: Moving Location: At Work Location: Other Recent Activity 1 Recent Activity 2 Recent Activity 3 Recent Activity 4 Recent Activity 5 Inclusion Proportion
14. ### 14 Key references • Goldsmith, Huang, and Crainiceanu (2014). Smooth

Scalar-on-Image Regression via Spatial Bayesian Variable Selection. JCGS. • Wang and Zhu (2016). Generalized Scalar-on-Image Regression Models via Total Variation. JASA. • Kang, Reich and Staicu (Under review). Scalar-on-image regression via soft-thresholded Gaussian processes. • Goldsmith and Schwartz (2017). Variable Selection in the Functional Linear Concurrent Model. Statistics in Medicine.