260

# Computing Active Subspaces

My talk at the Joint Mathematics Meeting, San Antonio, January 2015.

January 11, 2015

## Transcript

1. ### Paul Constantine Colorado School of Mines paulconstantine.net @DrPaulynomial activesubspaces.org Qiqi

Wang (MIT, AeroAstro) David Gleich (Purdue, CS) COMPUTING ACTIVE SUBSPACES WARNING: These slides are meant to complement the oral presentation. Use out of context at your own risk. This material is based upon work supported by the U.S. Department of Energy Oﬃce of Science, Oﬃce of Advanced Scientiﬁc Computing Research, Applied Mathematics program under Award Number DE-SC-0011077.
2. ### ACTIVE COLLABORATIONS •  Scramjet safety (Stanford PSAAP Center) •  Photovoltaic

solar cell modeling (Mark Campanelli, NREL) •  Aerospace shape optimization (Juan Alonso, Stanford) •  Combustion stability analysis (Franck Nicoud, University Montpellier 2) •  Evapotranspiration in hydrology (Reed Maxwell, School of Mines) •  In-situ combustion reservoir simulation (Margot Gerritsen, Stanford) •  Combustion LES calibration (Guilhem Lacaze, Mohammad Khalil, SNL) •  Atmospheric re-entry (Nagi Mansour, NASA Ames; Thierry Magin, VKI) •  Subsurface sensitivity and inversion (Reed Maxwell, School of Mines) •  Wind turbine modeling (Peter Graf, Katherine Dykes, NREL) (And I’m open to others that might make me rich, famous, and/or tenured.) There are more applications than I have time for.

5. ### Z f( x ) ⇢ d x Doing things in

high dimensions is hard, in general. ˜ f( x ) ⇡ f( x ) minimize x f( x ) x = f 1( y ) APPROXIMATION OPTIMIZATION INTEGRATION INVERSION You must ﬁrst discover and then exploit some low- dimensional structure to make these problems tractable.
6. ### Pray for main eﬀects dominated: f ( x ) f1(

x1) + · · · + fm( xm) DISCOVER EXPLOIT Use sparse grids. Pray for sparsity: f( x ) ⇡ P X i=1 ai i( x ), many ai = 0 Pray for low rank: f ( x ) ⇥ r X i=1 f i 1 ( x1) · · · f i m ( xm) Use separation of variables (PGD, TT, ACA, …) Use compressed sensing.
7. ### Sensitivity analysis to eliminate variables: DISCOVER EXPLOIT Work with only

important variables. 2 6 4 x1 . . . x100 3 7 5 !  x1 x4

9. ### Active subspaces are deﬁned by the eigendecomposition of a matrix

derived from the function. f = f( x ), x ⇥ X Rm, ⇤ x f = 2 6 4 f/ x1 . . . f/ xm 3 7 5 Consider a function and its gradient vector, The uncentered covariance of the gradient and its eigendecomposition Partition the eigendecomposition Rotate and separate the coordinates ⇢ : Rm ! R + ⇤ =  ⇤1 ⇤2 , W = ⇥ W 1 W 2 ⇤ , W 1 2 Rm⇥n x = W W T x = W 1W T 1 x + W 2W T 2 x = W 1y + W 2z active variables inactive variables C = Z (r x f)(r x f)T ⇢ d x = W ⇤W T
10. ### The eigenvectors indicate the directions along which the function varies

the most---on average. Lemma 1: Proof sketch: Deﬁnition of . Lemma 2: Proof sketch: Chain rule, linearity of trace, deﬁnition of . i = Z (r x f)T wi 2 ⇢ d x , i = 1, . . . , m C Z (ryf)T (ryf) ⇢ d x = 1 + · · · + n Z (rzf)T (rzf) ⇢ d x = n+1 + · · · + m C
11. ### DISCOVER the active subspace with random sampling. Draw samples: Compute:

and fj = f( xj) r x fj = r x f( xj) Approximate with Monte Carlo Equivalent to SVD of samples of the gradient. Called an active subspace method in T. Russi’s 2010 Ph.D. thesis, Uncertainty Quantiﬁcation with Experimental Data in Complex System Models xj ⇠ ⇢ C ⇡ 1 N N X j=1 r x fj r x fT j = ˆ W ˆ ⇤ ˆ W T 1 p N ⇥ r x f1 · · · r x fN ⇤ = ˆ W p ˆ ⇤ ˆ V T
12. ### Approximating the eigenpairs is not as tough as you might

think. Probability that eigenvalue estimate precedes true eigenvalue Number of samples Bound on gradient norm squared P n ˆk  (1 " ) k o  k exp ✓ N 2 k "2 4 1L2 ◆ Using Gittens and Tropp (2011)
13. ### Approximating the eigenpairs is not as tough as you might

think. P n ˆk (1 + " ) k o  ( m k + 1) exp ✓ N k"2 4 L2 ◆ Probability that eigenvalue estimate exceeds true eigenvalue Number of samples Bound on gradient norm squared Using Gittens and Tropp (2011)
14. ### Using Gittens and Tropp (2011) Approximating the eigenpairs is not

as tough as you might think. Bound on gradient norm squared Relative accuracy Dimension (with high probability) N = ⌦ ✓ L2 1 2 k "2 log( m ) ◆ = ) | k ˆk |  " k
15. ### Approximating the eigenpairs is not as tough as you might

think. Combining Gittens and Tropp (2011) with Golub and Van Loan (1996), Stewart (1973) Bound on gradient norm squared Relative accuracy Dimension (with high probability) Spectral gap N = ⌦ ✓ L2 1"2 log( m ) ◆ = ) dist ( W 1, ˆ W 1)  4 1" n n+1
16. ### Q: How many samples of the gradient do I need?

A: Scales like log of dimension times problem-dependent constants. Q: How do I choose the dimension of the active subspace? A: Look for spectral gaps. Q: What happens if I have inaccurate gradients? A: Bias (that goes to zero) appears in the error bounds. Q: How do I quantify uncertainty in the estimates? A: Use a bootstrap!
17. ### f( x ) = 1 2x T A x ,

x 2 [ 1, 1]10, C = 1 3 A2 (A) 2 4 6 8 10 10−6 10−5 10−4 10−3 10−2 10−1 100 101 102 Index Eigenvalues 2 4 6 8 10 10−6 10−5 10−4 10−3 10−2 10−1 100 101 102 Index Eigenvalues
18. ### f( x ) = 1 2x T A x ,

x 2 [ 1, 1]10, C = 1 3 A2 1 2 3 4 5 6 10−8 10−6 10−4 10−2 100 102 104 Index Eigenvalues CI Est True 1 2 3 4 5 6 10−8 10−6 10−4 10−2 100 102 104 Index Eigenvalues CI Est True (C) k = 6 N = 28 eigenvalues samples
19. ### f( x ) = 1 2x T A x ,

x 2 [ 1, 1]10, C = 1 3 A2 1 2 3 4 5 6 10−6 10−5 10−4 10−3 10−2 10−1 100 Subspace Dimension Distance CI Est True 1 2 3 4 5 6 10−6 10−5 10−4 10−3 10−2 10−1 100 Subspace Dimension Distance CI Est True dist(W 1, ˆ W 1) k = 6 N = 28 eigenvalues samples
20. ### f( x ) = 1 2x T A x ,

x 2 [ 1, 1]10, C = 1 3 A2 1 2 3 4 5 6 10−8 10−6 10−4 10−2 100 102 104 Index Eigenvalues CI Est True 1 2 3 4 5 6 10−8 10−6 10−4 10−2 100 102 104 Index Eigenvalues CI Est True (C) First order ﬁnite diﬀerences: h = 0.1 k = 6 N = 28 eigenvalues samples
21. ### h = 0.001 f( x ) = 1 2x T

A x , x 2 [ 1, 1]10, C = 1 3 A2 1 2 3 4 5 6 10−8 10−6 10−4 10−2 100 102 104 Index Eigenvalues CI Est True 1 2 3 4 5 6 10−8 10−6 10−4 10−2 100 102 104 Index Eigenvalues CI Est True First order ﬁnite diﬀerences: (C) k = 6 N = 28 eigenvalues samples
22. ### h = 0.00001 f( x ) = 1 2x T

A x , x 2 [ 1, 1]10, C = 1 3 A2 1 2 3 4 5 6 10−8 10−6 10−4 10−2 100 102 104 Index Eigenvalues CI Est True 1 2 3 4 5 6 10−8 10−6 10−4 10−2 100 102 104 Index Eigenvalues CI Est True First order ﬁnite diﬀerences: (C) k = 6 N = 28 eigenvalues samples
23. ### f( x ) = 1 2x T A x ,

x 2 [ 1, 1]10, C = 1 3 A2 1 2 3 4 5 6 10−6 10−5 10−4 10−3 10−2 10−1 100 Subspace Dimension Distance CI Est True 1 2 3 4 5 6 10−6 10−5 10−4 10−3 10−2 10−1 100 Subspace Dimension Distance CI Est True First order ﬁnite diﬀerences: h = 0.1 dist(W 1, ˆ W 1) k = 6 N = 28 eigenvalues samples
24. ### h = 0.001 f( x ) = 1 2x T

A x , x 2 [ 1, 1]10, C = 1 3 A2 1 2 3 4 5 6 10−6 10−5 10−4 10−3 10−2 10−1 100 Subspace Dimension Distance CI Est True 1 2 3 4 5 6 10−6 10−5 10−4 10−3 10−2 10−1 100 Subspace Dimension Distance CI Est True First order ﬁnite diﬀerences: dist(W 1, ˆ W 1) k = 6 N = 28 eigenvalues samples
25. ### h = 0.00001 f( x ) = 1 2x T

A x , x 2 [ 1, 1]10, C = 1 3 A2 1 2 3 4 5 6 10−6 10−5 10−4 10−3 10−2 10−1 100 Subspace Dimension Distance CI Est True 1 2 3 4 5 6 10−6 10−5 10−4 10−3 10−2 10−1 100 Subspace Dimension Distance CI Est True First order ﬁnite diﬀerences: dist(W 1, ˆ W 1) k = 6 N = 28 eigenvalues samples
26. ### 1 p N ⇥ r x f1 · · ·

r x fN ⇤ ⇡ ˆ W 1 q ˆ ⇤1 ˆ V T 1 Low-rank approximation of the collection of gradients: Let’s be abundantly clear about the problem we are trying to solve. Low-dimensional linear approximation of the gradient: rf( x ) ⇡ ˆ W 1 a ( x ) f( x ) ⇡ g ⇣ ˆ W T 1 x ⌘ Approximate a function of many variables by a function of a few linear combinations of the variables: ✔   ✖   ✖
27. ### Deﬁne the conditional expectation: Theorem: Proof sketch: Tower property, Poincare

inequality, triangle inequality. g( y ) = Z f(W 1y + W 2z ) ⇢( z | y ) d z , f( x ) ⇡ g(W T 1 x ) EXPLOIT active subspaces for response surfaces with conditional averaging. ✓Z ⇣ f( x ) g( ˆ W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ⇣ " ( 1 + · · · + n)1 2 + ( n+1 + · · · + m)1 2 ⌘ Conditional expectation* Estimated eigenvectors Poincare constant Subspace error Eigenvalues for active variables Eigenvalues for inactive variables
28. ### There’s an active subspace in the canonical stochastic PDE. •

2d Poisson with 100-term KL coeﬃcients •  Output: Spatial average over Neumann bc. Log of coeﬃcients Correlation Scalar of interest 0 20 40 60 80 100 10−20 10−15 10−10 10−5 100 105 Eigenvalues Variable index KL,β=1 KL,β=0.01 AS,β=1 AS,β=0.01 0 20 40 60 80 100 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 First Eigenvector Variable index AS,β=1 AS,β=0.01 1 2 D r · ( a r u ) = 1 , x 2 D u = 0 , x 2 1 n · a r u = 0 , x 2 2 C (x, y) = exp 1 k x y k1 log(a(x, ✓)) = m X i=1 ✓i i i(x) f ( ✓ ) = 1 | 2 | Z 2 u ( x, ✓ ) dx
29. ### There’s an active subspace in the canonical stochastic PDE. •

2d Poisson with 100-term KL coeﬃcients •  Output: Spatial average over Neumann bc. Log of coeﬃcients Correlation Scalar of interest −3 −2 −1 0 1 2 3 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10−3 y f test GP mean train −3 −2 −1 0 1 2 3 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 x 10−4 y f test GP mean train Long corr. length Short corr. length = 0.01 = 1.00 1 2 D r · ( a r u ) = 1 , x 2 D u = 0 , x 2 1 n · a r u = 0 , x 2 2 C (x, y) = exp 1 k x y k1 log(a(x, ✓)) = m X i=1 ✓i i i(x) f ( ✓ ) = 1 | 2 | Z 2 u ( x, ✓ ) dx
30. ### DETAILS: Ø  “Full” is Gaussian process with linear mean on

all 100 parameters, trained with max likelihood on 500 runs Ø  “ASM” is Gaussian process on 1 active parameter, using 300 gradients to get active subspace and trained with max likelihood on 5 runs (one sample for conditional averages). Ø  Tested on 100 random design sites. 0.5 1 1.5 2 2.5 x 10−3 0.5 1 1.5 2 2.5 x 10−3 True Kriging Full ASM 2.6 2.8 3 3.2 3.4 3.6 x 10−4 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 x 10−4 True Kriging Full ASM = 1 Long corr. length: Short corr. length: = 0.01
31. ### THE TAKE-HOMES You must ﬁrst discover and then exploit some

low-dimensional structure to make high-dimensional problems tractable. Active subspaces oﬀer a new set of tools for working with functions that depends on several (tens to thousands of) variables. The right way to exploit active subspaces depends on what you’re trying to do (e.g., approximation, optimization, inversion, integration). COMING IN MARCH! Active Subspaces Emerging Ideas for Dimension Reduction in Parameter Studies SIAM, 2015
32. ### Questions? •  What if I don’t have gradients? •  Can

we talk about the trade-oﬀ between discovering and exploiting? •  Can you tell me more about [insert application]? •  How does this relate to other dimension reduction techniques? •  How new is all this? •  Can you show us the cool videos of the active subspace in the aerodynamic shape optimization? Paul Constantine Colorado School of Mines paulconstantine.net @DrPaulynomial Website: activesubspaces.org