Slide 1

Slide 1 text

Qiqi Wang MIT, AeroAstro ACTIVE SUBSPACES Emerging Ideas for Dimension Reduction in Parameter Studies This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC-0011077. Thanks to: David Gleich Purdue, CS PAUL CONSTANTINE Ben L. Fryrear Assistant Professor Applied Mathematics & Statistics Colorado School of Mines activesubspaces.org! @DrPaulynomial! SLIDES: DISCLAIMER: These slides are meant to complement the oral presentation. Use out of context at your own risk.

Slide 2

Slide 2 text

Tell me about your models. •  Do you work with a computer simulation? •  What are the inputs? What are the outputs? •  What is the question you want to answer?

Slide 3

Slide 3 text

f( x )

Slide 4

Slide 4 text

f( x ) Uncertainty quantification for hypersonic scramjets (with G. Iaccarino, J. Larsson, M. Emory) Sensitivity analysis in hydrological models (with R. Maxwell, J. Jefferson, J. Gilbert) Shape optimization in aerospace vehicles (with J. Alonso, T. Lukaczyk)

Slide 5

Slide 5 text

f( x ) Sensitivity analysis in solar cell models (with B. Zaharatos, M. Campanelli) Sensitivity analysis and reduced order models in HIV modeling (with T. Loudon, S. Pankavich) Annular combustor models (with M. Bauerheim, S. Moreau, F. Nicoud) Calibration of an atmospheric reentry vehicle model (with P. Congedo, T. Magin)

Slide 6

Slide 6 text

How many dimensions is high dimensions?

Slide 7

Slide 7 text

APPROXIMATION OPTIMIZATION INTEGRATION ˜ f( x ) ⇡ f( x ) Z f( x ) ⇢ d x minimize x f( x )

Slide 8

Slide 8 text

Dimension 10 points / dimension 1 second / evaluation 1 10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) REDUCED-ORDER MODELS

Slide 9

Slide 9 text

Dimension 10 points / dimension 1 second / evaluation 1 10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) BETTER DESIGNS

Slide 10

Slide 10 text

Dimension 10 points / dimension 1 second / evaluation 1 10 10 sec 2 100 ~ 1.6 min 3 1,000 ~ 16 min 4 10,000 ~ 2.7 hours 5 100,000 ~ 1.1 days 6 1,000,000 ~ 1.6 weeks … … … 20 1e20 3 trillion years (240x age of the universe) DIMENSION REDUCTION

Slide 11

Slide 11 text

f( x ) ⇡ p X k=1 ak k( x ), k a k0 ⌧ p f ( x ) ⇡ f1( x1) + · · · + fm( xm) There are many structure-exploiting methods for high-dimensional approximation. STRUCTURE METHODS Sparse grids, hyperbolic crosses, HDMR / ANOVA, … f ( x ) ⇡ r X k=1 fk( x1) · · · fk( xm) Low-rank methods, separation of variables, TT, ACA, PGD, … Compressed sensing, …

Slide 12

Slide 12 text

$27 VISIT THE SIAM BOOKSTORE! Coupon code: BKSL15

Slide 13

Slide 13 text

www.youtube.com/watch?v=mJvKzjT6lmY

Slide 14

Slide 14 text

Define the active subspace. Consider a function and its gradient vector, The average outer product of the gradient and its eigendecomposition, Partition the eigendecomposition, Rotate and separate the coordinates, ⇤ =  ⇤1 ⇤2 , W = ⇥ W 1 W 2 ⇤ , W 1 2 Rm⇥n x = W W T x = W 1W T 1 x + W 2W T 2 x = W 1y + W 2z active variables inactive variables f = f( x ), x 2 Rm, rf( x ) 2 Rm, ⇢ : Rm ! R + C = Z rf rfT ⇢ d x = W ⇤W T

Slide 15

Slide 15 text

Z x x T ⇢ d x VS. Z rf rfT ⇢ d x

Slide 16

Slide 16 text

i = Z rfT wi 2 ⇢ d x , i = 1, . . . , m The eigenpairs identify perturbations that change the function more, on average. LEMMA LEMMA Z (ryf)T (ryf) ⇢ d x = 1 + · · · + n Z (rzf)T (rzf) ⇢ d x = n+1 + · · · + m

Slide 17

Slide 17 text

Discover the active subspace with random sampling. Draw samples: Compute: and fj = f( xj) Approximate with Monte Carlo Equivalent to SVD of samples of the gradient Called an active subspace method in T. Russi’s 2010 Ph.D. thesis, Uncertainty Quantification with Experimental Data in Complex System Models xj ⇠ ⇢ C ⇡ 1 N N X j=1 rfj rfT j = ˆ W ˆ ⇤ ˆ W T 1 p N ⇥ rf1 · · · rfN ⇤ = ˆ W p ˆ ⇤ ˆ V T rfj = rf( xj)

Slide 18

Slide 18 text

Using Gittens and Tropp (2011) How many gradient samples? Bound on gradient norm squared Relative accuracy Dimension (with high probability) N = ⌦ ✓ L2 1 2 k "2 log( m ) ◆ = ) | k ˆk |  " k

Slide 19

Slide 19 text

Gittens and Tropp (2011), Golub and Van Loan (1996), Stewart (1973) Bound on gradient norm squared Relative accuracy Dimension (with high probability) Spectral gap N = ⌦ ✓ L2 1"2 log( m ) ◆ = ) dist ( W 1, ˆ W 1)  4 1" n n+1 How many gradient samples?

Slide 20

Slide 20 text

1 p N ⇥ rf1 · · · rfN ⇤ ⇡ ˆ W 1 q ˆ ⇤1 ˆ V T 1 Low-rank approximation of the collection of gradients: Let’s be abundantly clear about the problem we are trying to solve. Low-dimensional linear approximation of the gradient: rf( x ) ⇡ ˆ W 1 a ( x ) f( x ) ⇡ g ⇣ ˆ W T 1 x ⌘ Approximate a function of many variables by a function of a few linear combinations of the variables: ✔   ✖   ✖  

Slide 21

Slide 21 text

f( x ) ⇡ g ⇣ ˆ W T 1 x ⌘ APPROXIMATION

Slide 22

Slide 22 text

f( x ) ⇡ g ⇣ ˆ W T 1 x ⌘ How do you construct g? What is the approximation error? What is the effect of the approximate eigenvectors?

Slide 23

Slide 23 text

How do you construct g? f( x ) = f(W 1y + W 2z ) g(y) = f(W 1y) f( x ) ⇡ g(W T 1 x ) = f(W 1W T 1 x ) f( x ) ⇡ g (W T 1 x ) We know … How about … So that … x 1 -1 0 1 x 2 -1 0 1 But remember … i = Z (rfT wi)2 ⇢ d x

Slide 24

Slide 24 text

How do you construct g? f( x ) = f(W 1y + W 2z ) g(y) = f(W 1y) f( x ) ⇡ g(W T 1 x ) = f(W 1W T 1 x ) f( x ) ⇡ g (W T 1 x ) We know … How about … So that … x 1 -1 0 1 x 2 -1 0 1 But remember … i = Z (rfT wi)2 ⇢ d x

Slide 25

Slide 25 text

Define the conditional expectation: Define the Monte Carlo approximation: THEOREM g( y ) = Z f(W 1y + W 2z ) ⇢( z | y ) d z , f( x ) ⇡ g(W T 1 x ) ˆ g(y) = 1 N N X i=1 f(W 1y + W 2zi), zi ⇠ ⇢(z|y) Exploit active subspaces for response surfaces with conditional averaging. ✓Z ⇣ f( x ) g(W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ( n+1 + · · · + m)1 2 ✓Z ⇣ f( x ) ˆ g(W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ⇣ 1 + N 1 2 ⌘ ( n+1 + · · · + m)1 2 THEOREM

Slide 26

Slide 26 text

" = dist(W 1, ˆ W 1) = W 1W T 1 ˆ W 1 ˆ W T 1 ✓Z ⇣ f( x ) g( ˆ W T 1 x ) ⌘2 ⇢ d x ◆1 2  CP ⇣ " ( 1 + · · · + n)1 2 + ( n+1 + · · · + m)1 2 ⌘ Subspace error Eigenvalues for active variables Eigenvalues for inactive variables Define the subspace error: THEOREM Exploit active subspaces for response surfaces with conditional averaging.

Slide 27

Slide 27 text

1.  Choose points in the domain of g. 2.  Estimate conditional average at each point. 3.  Construct the approximation* in n < m dimensions. ( * Don’t interpolate! )

Slide 28

Slide 28 text

EXAMPLE WITH POISSON r · (a ru) = 1

Slide 29

Slide 29 text

D 1 2 The parameterized PDE The coefficients r · (aru) = 1, s 2 D u = 0, s 2 1 n · aru = 0, s 2 2 log(a(s, x)) = m X k=1 p ✓k k(s) xk Cov( s1, s2) = 2 exp ✓ ks1 s2 k1 2 ` ◆ The quantity of interest f( x ) = Z u( s , x ) d 2 •  100-term KL •  Gaussian r.v.’s Rough fields Spatial average over Neumann boundary

Slide 30

Slide 30 text

Index 0 50 100 KL eigenvalues 100 101 102 103 ` = 1 ` = 0:01

Slide 31

Slide 31 text

Long length scale, ` = 1 log( a (s , x)) u( s , x )

Slide 32

Slide 32 text

Short length scale, log( a (s , x)) u( s , x ) ` = 0.01

Slide 33

Slide 33 text

f( x ) ⇢( x ) rf( x ) PDE solution’s spatial average along the Neumann boundary Standard Gaussian density Gradient computed with adjoint equations

Slide 34

Slide 34 text

Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Est BI Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Est BI Index 1 2 3 4 5 6 Eigenvalues 10-13 10-12 10-11 10-10 10-9 10-8 10-7 10-6 Long length scale, ` = 1 Short length scale, ` = 0.01 Eigenvalues of 1 N N X j=1 rfj rfT j

Slide 35

Slide 35 text

Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 BI Est Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 BI Est Subspace Dimension 1 2 3 4 5 6 Subspace Distance 10-2 10-1 100 Long length scale, ` = 1 Short length scale, ` = 0.01 Estimates of subspace error " = dist (W 1, ˆ W 1)

Slide 36

Slide 36 text

Parameter index 0 50 100 Eigenvector components -1 -0.5 0 0.5 1 Parameter index 0 50 100 Eigenvector components -1 -0.5 0 0.5 1 First eigenvector of Long length scale, ` = 1 Short length scale, ` = 0.01 1 N N X j=1 rfj rfT j

Slide 37

Slide 37 text

-4 -2 0 2 4 Quantity of interest #10-3 0 1 2 3 4 5 -4 -2 0 2 4 Quantity of interest #10-4 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 Long length scale, ` = 1 Short length scale, ` = 0.01 f( x ) ⇡ g( ˆ w T 1 x ) ˆ w T 1 x ˆ w T 1 x Remember the goal! Plotting versus f( xj) ˆ w T 1 xj

Slide 38

Slide 38 text

CHOOSE YOUR OWN ADVENTURE! P(A|B) = P(B|A) P(A) P(B) INTEGRATION Z f( x ) ⇢ d x INVERSION

Slide 39

Slide 39 text

INVERSION P(A|B) = P(B|A) P(A) P(B)

Slide 40

Slide 40 text

d = m ( x ) + e , e ⇠ N(0, 2I) ⇢lik(x , d) = exp( k d m(x) k2/ 2 2 ) ⇢ pos ( x ) = c 1 pos ⇢ lik ( x , d ) ⇢ pr ( x ) One slide on the Bayesian setup Additive Gaussian noise model: Standard Gaussian prior: Likelihood function from the noise model: Bayes’ Theorem gives the posterior: ⇢pr(x) = (2 ⇡ ) m 2 exp( x T x / 2)

Slide 41

Slide 41 text

f( x ) = k d m ( x )k2/2 2 rf( x ) = 1 2 2 r m ( x )T ( d m ( x )) 1.  A differentiable, scalar-valued function 2.  A weight function for averaging Two things define the active subspace. ⇢( x ) f( x ) The negative log-likelihood or data misfit: The standard Gaussian prior density: ⇢( x ) = ⇢pr( x )

Slide 42

Slide 42 text

⇢ pos(x) = c 1 pos exp( f (x)) ⇢ pr(x) ⇢ pos(x) ⇡ ⇡ (x) = c 1 ⇡ exp( g ( W T 1 x)) ⇢ pr(x) = c 1 ⇡ exp( g (y)) ⇢ pr(y) ⇢ pr(z) ⇢pr( x ) = ⇢pr(W 1y + W 2z ) = ⇢pr( y , z ) = ⇢pr( y )⇢pr( z ) Approximate the posterior by exploiting the active subspace. The Gaussian prior is separable: Recall the posterior. And here’s how we approximate it.

Slide 43

Slide 43 text

MCMC in n

Slide 44

Slide 44 text

What is the approximation error? ⇢ pos(x) ⇡ c 1 ⇡ exp( g (y)) ⇢ pr(y) ⇢ pr(z) Approximate the posterior by exploiting the active subspace.

Slide 45

Slide 45 text

THEOREM Hellinger distance between posterior and approximation Eigenvalues for inactive variables The approximation error depends on the eigenvalues. ✓Z ⇣ ⇢ 1 2 pos ⇡ 1 2 ⌘ 2 d x ◆1 2  C ( n +1 + · · · + m)1 2

Slide 46

Slide 46 text

An active subspace-exploiting MCMC Given and : 1.  Draw from a symmetric proposal centered at 2.  Compute the approximate misfit with Monte Carlo 3.  Compute the acceptance ratio 4.  Draw t from uniform, and set yk y0 k yk g(y0 k ) ⇡ PN j=1 f(W 1y0 k + W 2zj), zj ⇠ ⇢pr(z) g(yk) = exp( g ( y0 k)) ⇢pr( y0 k) exp( g ( yk)) ⇢pr( yk) yk+1 = ⇢ y0 k if t , yk otherwise. yk For each from the Markov chain: xk,` = W 1yk + W 2zk,`, zk,` ⇠ ⇢pr( z )

Slide 47

Slide 47 text

[ Show them the MATLAB demo! ]

Slide 48

Slide 48 text

1 D “sensors” •  2nd-order finite difference •  100x100 mesh •  Automatic differentiation Invert for the subsurface with measurements on the surface.

Slide 49

Slide 49 text

eigenvalues subspace error 1 D “sensors” •  2nd-order finite difference •  100x100 mesh •  Automatic differentiation Invert for the subsurface with measurements on the surface.

Slide 50

Slide 50 text

1 1.005 1.01 1.015 1.02 Step #105 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 State 1 1.005 1.01 1.015 1.02 Step #105 -3 -2 -1 0 1 2 3 4 State 0 100 200 300 400 500 Lag 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Autocorrelation Vanilla AS2, 0.1 AS2, 0.3 Autocorrelation function MCMC mixes better in lower dimensions. Standard Metropolis-Hastings (100d) 500k steps / 500k forward models Proposal variance 0.1 Acceptance ~60% Active-subspace accelerated (2d) 50k steps / 500k forward models Proposal variance 0.3 Acceptance ~62%

Slide 51

Slide 51 text

Working in low dimensions introduces some bias. 0 20 40 60 80 100 Index -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Centered posterior mean 0 20 40 60 80 100 Index -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Centered posterior variance

Slide 52

Slide 52 text

-3 -2 -1 0 1 2 3 y2 -3 -2 -1 0 1 2 3 y1 -3 -2 -1 0 1 2 3 y3 -3 -2 -1 0 1 2 3 y1 -3 -2 -1 0 1 2 3 y4 -3 -2 -1 0 1 2 3 y1 -3 -2 -1 0 1 2 3 y3 -3 -2 -1 0 1 2 3 y2 -3 -2 -1 0 1 2 3 y4 -3 -2 -1 0 1 2 3 y2 -3 -2 -1 0 1 2 3 y4 -3 -2 -1 0 1 2 3 y3 1 2 3 2 3 4 The 100d chain shows evidence of a 2d active subspace.

Slide 53

Slide 53 text

Z f( x ) ⇢( x ) d x INTEGRATION

Slide 54

Slide 54 text

Z f( x ) ⇢ d x = Z Z f(W 1y + W 2z ) ⇢( z | y ) d z | {z } = g(y) ⇢( y ) d y = Z g( y ) ⇢( y ) d y ⇡ N X i=1 g( yi) wi ⇡ N X i=1 ˆ g( yi) wi Integrate in active variables Quadrature rule in active variables Monte Carlo in inactive variables

Slide 55

Slide 55 text

An active subspace-exploiting quadrature rule

Slide 56

Slide 56 text

There are more questions than answers. •  Exploiting symmetries? •  Nested node sets? •  Asymptotics? •  Error analysis?

Slide 57

Slide 57 text

•  How do active subspaces relate to [insert method]? •  What if I don’t have gradients? •  What kinds of models does this work on? •  Can you show some applications? •  What about multiple quantities of interest? •  Tell me about hypercubes. •  How new is all this? PAUL CONSTANTINE Ben L. Fryrear Assistant Professor Colorado School of Mines activesubspaces.org! @DrPaulynomial! QUESTIONS?