ACTIVE SUBSPACES: Emerging Ideas for Dimension Reduction in Parameter Studies

Qiqi Wang MIT, AeroAstro ACTIVE SUBSPACES Emerging Ideas for Dimension
Reduction in Parameter Studies This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC-0011077. Thanks to: David Gleich Purdue, CS PAUL CONSTANTINE Ben L. Fryrear Assistant Professor Applied Mathematics & Statistics Colorado School of Mines activesubspaces.org! @DrPaulynomial! SLIDES: DISCLAIMER: These slides are meant to complement the oral presentation. Use out of context at your own risk.

Tell me about your models. •  Do you work with
a computer simulation? •  What are the inputs? What are the outputs? •  What is the question you want to answer?

f( x )

f( x ) Uncertainty quantification for hypersonic scramjets (with G.
Iaccarino, J. Larsson, M. Emory) Sensitivity analysis in hydrological models (with R. Maxwell, J. Jefferson, J. Gilbert) Shape optimization in aerospace vehicles (with J. Alonso, T. Lukaczyk)

f( x ) Sensitivity analysis in solar cell models (with
B. Zaharatos, M. Campanelli) Sensitivity analysis and reduced order models in HIV modeling (with T. Loudon, S. Pankavich) Annular combustor models (with M. Bauerheim, S. Moreau, F. Nicoud) Calibration of an atmospheric reentry vehicle model (with P. Congedo, T. Magin)

How many dimensions is high dimensions?

APPROXIMATION OPTIMIZATION INTEGRATION ˜ f( x ) ⇡ f( x
) Z f( x ) ⇢ d x minimize x f( x )

Dimension 10 points / dimension 1 second / evaluation 1
10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) REDUCED-ORDER MODELS

10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) BETTER DESIGNS

10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) DIMENSION REDUCTION

f( x ) ⇡ p X k=1 ak k( x
), k a k0 ⌧ p f ( x ) ⇡ f1( x1) + · · · + fm( xm) There are many structure-exploiting methods for high-dimensional approximation. STRUCTURE METHODS Sparse grids, hyperbolic crosses, HDMR / ANOVA, … f ( x ) ⇡ r X k=1 fk( x1) · · · fk( xm) Low-rank methods, separation of variables, TT, ACA, PGD, … Compressed sensing, …

$27 VISIT THE SIAM BOOKSTORE! Coupon code: BKSL15

www.youtube.com/watch?v=mJvKzjT6lmY

Deﬁne the active subspace. Consider a function and its gradient
vector, The average outer product of the gradient and its eigendecomposition, Partition the eigendecomposition, Rotate and separate the coordinates, ⇤ =  ⇤1 ⇤2 , W = ⇥ W 1 W 2 ⇤ , W 1 2 Rm⇥n x = W W T x = W 1W T 1 x + W 2W T 2 x = W 1y + W 2z active variables inactive variables f = f( x ), x 2 Rm, rf( x ) 2 Rm, ⇢ : Rm ! R + C = Z rf rfT ⇢ d x = W ⇤W T

Z x x T ⇢ d x VS. Z rf
rfT ⇢ d x

i = Z rfT wi 2 ⇢ d x ,
i = 1, . . . , m The eigenpairs identify perturbations that change the function more, on average. LEMMA LEMMA Z (ryf)T (ryf) ⇢ d x = 1 + · · · + n Z (rzf)T (rzf) ⇢ d x = n+1 + · · · + m

Discover the active subspace with random sampling. Draw samples: Compute:
and fj = f( xj) Approximate with Monte Carlo Equivalent to SVD of samples of the gradient Called an active subspace method in T. Russi’s 2010 Ph.D. thesis, Uncertainty Quantification with Experimental Data in Complex System Models xj ⇠ ⇢ C ⇡ 1 N N X j=1 rfj rfT j = ˆ W ˆ ⇤ ˆ W T 1 p N ⇥ rf1 · · · rfN ⇤ = ˆ W p ˆ ⇤ ˆ V T rfj = rf( xj)

Using Gittens and Tropp (2011) How many gradient samples? Bound
on gradient norm squared Relative accuracy Dimension (with high probability) N = ⌦ ✓ L2 1 2 k "2 log( m ) ◆ = ) | k ˆk |  " k

Gittens and Tropp (2011), Golub and Van Loan (1996), Stewart
(1973) Bound on gradient norm squared Relative accuracy Dimension (with high probability) Spectral gap N = ⌦ ✓ L2 1"2 log( m ) ◆ = ) dist ( W 1, ˆ W 1)  4 1" n n+1 How many gradient samples?

1 p N ⇥ rf1 · · · rfN ⇤
⇡ ˆ W 1 q ˆ ⇤1 ˆ V T 1 Low-rank approximation of the collection of gradients: Let’s be abundantly clear about the problem we are trying to solve. Low-dimensional linear approximation of the gradient: rf( x ) ⇡ ˆ W 1 a ( x ) f( x ) ⇡ g ⇣ ˆ W T 1 x ⌘ Approximate a function of many variables by a function of a few linear combinations of the variables: ✔ ✖ ✖

f( x ) ⇡ g ⇣ ˆ W T 1
x ⌘ APPROXIMATION

f( x ) ⇡ g ⇣ ˆ W T 1
x ⌘ How do you construct g? What is the approximation error? What is the effect of the approximate eigenvectors?

How do you construct g? f( x ) = f(W
1y + W 2z ) g(y) = f(W 1y) f( x ) ⇡ g(W T 1 x ) = f(W 1W T 1 x ) f( x ) ⇡ g (W T 1 x ) We know … How about … So that … x 1 -1 0 1 x 2 -1 0 1 But remember … i = Z (rfT wi)2 ⇢ d x

Define the conditional expectation: Define the Monte Carlo approximation: THEOREM
g( y ) = Z f(W 1y + W 2z ) ⇢( z | y ) d z , f( x ) ⇡ g(W T 1 x ) ˆ g(y) = 1 N N X i=1 f(W 1y + W 2zi), zi ⇠ ⇢(z|y) Exploit active subspaces for response surfaces with conditional averaging. ✓Z ⇣ f( x ) g(W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ( n+1 + · · · + m)1 2 ✓Z ⇣ f( x ) ˆ g(W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ⇣ 1 + N 1 2 ⌘ ( n+1 + · · · + m)1 2 THEOREM

" = dist(W 1, ˆ W 1) = W 1W
T 1 ˆ W 1 ˆ W T 1 ✓Z ⇣ f( x ) g( ˆ W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ⇣ " ( 1 + · · · + n)1 2 + ( n+1 + · · · + m)1 2 ⌘ Subspace error Eigenvalues for active variables Eigenvalues for inactive variables Define the subspace error: THEOREM Exploit active subspaces for response surfaces with conditional averaging.

1.  Choose points in the domain of g. 2.  Estimate
conditional average at each point. 3.  Construct the approximation* in n < m dimensions. ( * Don’t interpolate! )

EXAMPLE WITH POISSON r · (a ru) = 1

D 1 2 The parameterized PDE The coefﬁcients r ·
(aru) = 1, s 2 D u = 0, s 2 1 n · aru = 0, s 2 2 log(a(s, x)) = m X k=1 p ✓k k(s) xk Cov( s1, s2) = 2 exp ✓ ks1 s2 k1 2 ` ◆ The quantity of interest f( x ) = Z u( s , x ) d 2 •  100-term KL •  Gaussian r.v.’s Rough fields Spatial average over Neumann boundary

Index 0 50 100 KL eigenvalues 100 101 102 103
` = 1 ` = 0:01

Long length scale, ` = 1 log( a (s ,
x)) u( s , x )

Short length scale, log( a (s , x)) u( s
, x ) ` = 0.01

f( x ) ⇢( x ) rf( x ) PDE
solution’s spatial average along the Neumann boundary Standard Gaussian density Gradient computed with adjoint equations

Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12
10-11 10-10 10-9 10-8 10-7 10-6 Est BI Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Est BI Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Long length scale, ` = 1 Short length scale, ` = 0.01 Eigenvalues of 1 N N X j=1 rfj rfT j

Subspace Dimension 1 2 3 4 5 6 Subspace Distance
10-2 10-1 100 BI Est Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 BI Est Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 Long length scale, ` = 1 Short length scale, ` = 0.01 Estimates of subspace error " = dist (W 1, ˆ W 1)

Parameter index 0 50 100 Eigenvector components -1 -0.5 0
0.5 1 Parameter index 0 50 100 Eigenvector components -1 -0.5 0 0.5 1 First eigenvector of Long length scale, ` = 1 Short length scale, ` = 0.01 1 N N X j=1 rfj rfT j

-4 -2 0 2 4 Quantity of interest #10-3 0
1 2 3 4 5 -4 -2 0 2 4 Quantity of interest #10-4 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 Long length scale, ` = 1 Short length scale, ` = 0.01 f( x ) ⇡ g( ˆ w T 1 x ) ˆ w T 1 x ˆ w T 1 x Remember the goal! Plotting versus f( xj) ˆ w T 1 xj

CHOOSE YOUR OWN ADVENTURE! P(A|B) = P(B|A) P(A) P(B) INTEGRATION
Z f( x ) ⇢ d x INVERSION

INVERSION P(A|B) = P(B|A) P(A) P(B)

d = m ( x ) + e , e
⇠ N(0, 2I) ⇢lik(x , d) = exp( k d m(x) k2/ 2 2 ) ⇢ pos ( x ) = c 1 pos ⇢ lik ( x , d ) ⇢ pr ( x ) One slide on the Bayesian setup Additive Gaussian noise model: Standard Gaussian prior: Likelihood function from the noise model: Bayes’ Theorem gives the posterior: ⇢pr(x) = (2 ⇡ ) m 2 exp( x T x / 2)

f( x ) = k d m ( x )k2/2
2 rf( x ) = 1 2 2 r m ( x )T ( d m ( x )) 1.  A differentiable, scalar-valued function 2.  A weight function for averaging Two things deﬁne the active subspace. ⇢( x ) f( x ) The negative log-likelihood or data misfit: The standard Gaussian prior density: ⇢( x ) = ⇢pr( x )

⇢ pos(x) = c 1 pos exp( f (x)) ⇢
pr(x) ⇢ pos(x) ⇡ ⇡ (x) = c 1 ⇡ exp( g ( W T 1 x)) ⇢ pr(x) = c 1 ⇡ exp( g (y)) ⇢ pr(y) ⇢ pr(z) ⇢pr( x ) = ⇢pr(W 1y + W 2z ) = ⇢pr( y , z ) = ⇢pr( y )⇢pr( z ) Approximate the posterior by exploiting the active subspace. The Gaussian prior is separable: Recall the posterior. And here’s how we approximate it.

MCMC in n<m dimensions Independent sampling in m-n dimensions ⇢
pos(x) ⇡ c 1 ⇡ exp( g (y)) ⇢ pr(y) ⇢ pr(z) Approximate the posterior by exploiting the active subspace.

What is the approximation error? ⇢ pos(x) ⇡ c 1
⇡ exp( g (y)) ⇢ pr(y) ⇢ pr(z) Approximate the posterior by exploiting the active subspace.

THEOREM Hellinger distance between posterior and approximation Eigenvalues for inactive
variables The approximation error depends on the eigenvalues. ✓Z ⇣ ⇢ 1 2 pos ⇡ 1 2 ⌘ 2 d x ◆1 2  C ( n +1 + · · · + m)1 2

An active subspace-exploiting MCMC Given and : 1.  Draw from
a symmetric proposal centered at 2.  Compute the approximate misfit with Monte Carlo 3.  Compute the acceptance ratio 4.  Draw t from uniform, and set yk y0 k yk g(y0 k ) ⇡ PN j=1 f(W 1y0 k + W 2zj), zj ⇠ ⇢pr(z) g(yk) = exp( g ( y0 k)) ⇢pr( y0 k) exp( g ( yk)) ⇢pr( yk) yk+1 = ⇢ y0 k if t , yk otherwise. yk For each from the Markov chain: xk,` = W 1yk + W 2zk,`, zk,` ⇠ ⇢pr( z )

[ Show them the MATLAB demo! ]

1 D “sensors” •  2nd-order finite difference •  100x100 mesh
•  Automatic differentiation Invert for the subsurface with measurements on the surface.

eigenvalues subspace error 1 D “sensors” •  2nd-order finite difference
•  100x100 mesh •  Automatic differentiation Invert for the subsurface with measurements on the surface.

1 1.005 1.01 1.015 1.02 Step #105 -2.5 -2 -1.5
-1 -0.5 0 0.5 1 1.5 2 2.5 State 1 1.005 1.01 1.015 1.02 Step #105 -3 -2 -1 0 1 2 3 4 State 0 100 200 300 400 500 Lag 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Autocorrelation Vanilla AS2, 0.1 AS2, 0.3 Autocorrelation function MCMC mixes better in lower dimensions. Standard Metropolis-Hastings (100d) 500k steps / 500k forward models Proposal variance 0.1 Acceptance ~60% Active-subspace accelerated (2d) 50k steps / 500k forward models Proposal variance 0.3 Acceptance ~62%

Working in low dimensions introduces some bias. 0 20 40
60 80 100 Index -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Centered posterior mean 0 20 40 60 80 100 Index -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Centered posterior variance

-3 -2 -1 0 1 2 3 y2 -3 -2
-1 0 1 2 3 y1 -3 -2 -1 0 1 2 3 y3 -3 -2 -1 0 1 2 3 y1 -3 -2 -1 0 1 2 3 y4 -3 -2 -1 0 1 2 3 y1 -3 -2 -1 0 1 2 3 y3 -3 -2 -1 0 1 2 3 y2 -3 -2 -1 0 1 2 3 y4 -3 -2 -1 0 1 2 3 y2 -3 -2 -1 0 1 2 3 y4 -3 -2 -1 0 1 2 3 y3 1 2 3 2 3 4 The 100d chain shows evidence of a 2d active subspace.

Z f( x ) ⇢( x ) d x INTEGRATION

Z f( x ) ⇢ d x = Z Z
f(W 1y + W 2z ) ⇢( z | y ) d z | {z } = g(y) ⇢( y ) d y = Z g( y ) ⇢( y ) d y ⇡ N X i=1 g( yi) wi ⇡ N X i=1 ˆ g( yi) wi Integrate in active variables Quadrature rule in active variables Monte Carlo in inactive variables

An active subspace-exploiting quadrature rule

There are more questions than answers. •  Exploiting symmetries? • 
Nested node sets? •  Asymptotics? •  Error analysis?

•  How do active subspaces relate to [insert method]? • 
What if I don’t have gradients? •  What kinds of models does this work on? •  Can you show some applications? •  What about multiple quantities of interest? •  Tell me about hypercubes. •  How new is all this? PAUL CONSTANTINE Ben L. Fryrear Assistant Professor Colorado School of Mines activesubspaces.org! @DrPaulynomial! QUESTIONS?

ACTIVE SUBSPACES: Emerging Ideas for Dimension ...

ACTIVE SUBSPACES: Emerging Ideas for Dimension Reduction in Parameter Studies

More Decks by Paul Constantine

Other Decks in Science

Featured

Transcript