Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Less is truly more

Less is truly more

My keynote at the NAFEMS European Conference on Optimization.

Pranay Seshadri

October 16, 2019
Tweet

More Decks by Pranay Seshadri

Other Decks in Research

Transcript

  1. Less is truly more Dimension reduction in computational simulations. NAFEMS

    European Conference on Optimization October 2019 Pranay Seshadri University of Cambridge | The Alan Turing Institute www.psesh.com
  2. Nicholas Wong PhD Research Student, Department of Engineering University of

    Cambridge (joint w/ Geoff Parks) James Gross PhD Research Student, Department of Engineering University of Cambridge (joint w/ Geoff Parks) Ashley Scillitoe Research Associate, The Alan Turing Institute. Słaowmir Tadeja PhD Research Student, Department of Engineering, University of Cambridge (joint w/ Per Ola Kristensson) Group members
  3. Group projects Physics informed machine learning with sensor data •

    Accurate performance metrics (such as efficiency) require accurate measurements. • Tremendous body of work dedicated to characterizing uncertainty in simulations. • Challenging to deploy existing paradigms to experiments — measurements are noisy, sparse and possibly erroneous.
  4. Group projects Physics informed machine learning with sensor data •

    Can construct spatial approximations at axial planes using Gaussian process regression. • Physics-based kernels are incorporated to encode spatial harmonics, breaking the Nyquist condition. • Robust framework designed for difficult cases — measurements are noisy, sparse and possibly erroneous.
  5. Group projects Physics informed machine learning with simulation data •

    Computational fluid dynamics simulations (via RANS) require a turbulence model: ⇢¯ uj @¯ ui @xj = ⇢ ¯ fi + @ @xj ⇣ ¯ p ij + 2µ ¯ Sij ⇢u0 i u0 j ⌘ modeled term • Challenge: Understanding the physical cause of machine learned predicted correction rather than injecting directly it into the RANS. High-fidelity training data Turbulence model • Developing supervised machine learning paradigms to learn from High-fidelity training data to augment turbulence models. ⇢u0 i u0 j
  6. Group projects Effective Quadratures • A free, open-source library for

    uncertainty quantification, machine learning, optimisation, numerical integration and dimension reduction. • Algorithms and routines build upon extensive collective experience of team members and advisors — with backgrounds ranging from applied statistics, numerical analysis to flow physics engineering. • See: www.effective-quadratures.org
  7. I 2D Easy to navigate, easy to debate. Easy to

    communicate and create. Your scatter, surface or contours suggest Your data and inference are easy to digest.
  8. Generate 3 x 300 uniformly distributed random numbers between [0,1].

    For each set of numbers, evaluate . Sufficient summary plots y = log (x1 + x2 + x3)
  9. 0.01 0.39 0.61 0.55 0.83 0.54 0.39 0.81 0.75 0.33

    0.78 0.94 0.46 0.23 0.58 0.83 x1 x2 x3 y · · · · · · · · · y Generate 3 x 300 uniformly distributed random numbers between [0,1]. For each set of numbers, evaluate . Sufficient summary plots y = log (x1 + x2 + x3)
  10. 0 0.2 0.4 0.6 0.8 1 -1 -0.5 0 0.5

    1 kT = ⇥ 1 0 0 ⇤ Generate 3 x 300 uniformly distributed random numbers between [0,1]. For each set of numbers, evaluate . Plot vs. for . Sufficient summary plots kT xi i = 1, . . . , 300 yi y = log (x1 + x2 + x3)
  11. 0 0.2 0.4 0.6 0.8 1 -1 -0.5 0 0.5

    1 kT = ⇥ 0.2 0.5 0.2 ⇤ Generate 3 x 300 uniformly distributed random numbers between [0,1]. For each set of numbers, evaluate . Plot vs. for . Sufficient summary plots kT xi i = 1, . . . , 300 yi y = log (x1 + x2 + x3)
  12. kT = ⇥ 1/ p 3 1/ p 3 1/

    p 3 ⇤ 0 0.5 1 1.5 2 -1 -0.5 0 0.5 1 In this case, we refer to as our active subspace. kT = ⇥ 1/ p 3 1/ p 3 1/ p 3 ⇤ Generate 3 x 300 uniformly distributed random numbers between [0,1]. For each set of numbers, evaluate . Plot vs. for . Sufficient summary plots kT xi i = 1, . . . , 300 yi y = log (x1 + x2 + x3) sufficient summary plot!
  13. The trouble is that high-dimensional problems are inevitable and unavoidable.

    Certain high-dimensional problems are inherently low-dimensional (physics-driven).
  14. Talk overview Potpourri of theory, methods and applications 1. Finding

    active subpsaces. 2. Exploiting active subspaces. 3. Ridge approximations. 4. Going beyond ridge approximations.
  15. CAD model* Tip mesh Hub mesh Parametric mesh *A rig

    research blade is used in this study. Motivating example Aerothermal design of a 3D parameterized blade
  16. Tip mesh (+) (-) (+) (-) (-) (+) (-) (+)

    (-) (+) Nominal blade Samples from the design space x 2 R25 NOTE: Most blades are parameterized with 200+ variables; here we consider only a 25D subset. Motivating example Design space
  17. xj ⇠ ⇢ Draw samples: From the simulation compute [multiple]

    quantities of interest (qoi): geometry mesh PDE solver fj = f (xj) Can study multiple quantities of interest, such as: - Efficiency - Pressure ratio - Flow capacity xj f (xj) We create a data set of N=500 samples. Generate a data-set
  18. APPROXIMATE OPTIMIZE INTEGRATE INVERT ˜ f (x) ⇡ f (x)

    Z f (x) dx minimize x f (x) given yj, find xj such that yj ⇡ f (xj) Goal Slide shamelessly borrowed from Paul Constantine.
  19. Number of points per dimensions # of evals. Time (assuming

    1 minute per simulation) 1 25 25 minutes 2 ~33 million ~62 years x 2 R25 Enter the curse of dimensionality
  20. Fit data to a global least-squares quadratic: Compute its gradient

    and assemble the covariance matrix:. Compute the eigendecomposition of this covariance matrix and partition based on eigenvalues: f (xj) ⇡ 1 2 xT j Axj + cT xj + d rxj f = Axj + c C = 1 N N X j=1 rxj f rxj fT A simple recipe where C = ⇥ M N ⇤  ⇤1 ⇤1 ⇥ M N ⇤T Use for generating sufficient summary plots. M . . . 2
  21. 0 5 10 15 20 25 10-3 10-2 10-1 100

    M 2 R25⇥2 Eigendecomposition With efficiency asf Design variables Eigenvalues ⇤ =  ⇤1 ⇤2
  22. Also observe how efficiency increases quadratically along this vector and

    the pressure ratio decreases linearly. Sufficient summary plot With efficiency as f Visualizing the 2D contours, we observe a dominant direction for efficiency. 0.5%
  23. Flow capacity active variable f Flow capacity CFD values, Components

    of M MT x Sufficient summary plot With flow capacity asf
  24. Talk overview Potpourri of theory, methods and applications 1. Finding

    active subpsaces. 2. Exploiting active subspaces. 3. Ridge approximations. 4. Going beyond ridge approximations.
  25. d = 3 The projection of an k dimensional hypercube

    on a 2D plane results in a domain called a zonotope. Zonotopes Visualizing the entire design space
  26. d = 5 The projection of an k dimensional hypercube

    on a 2D plane results in a domain called a zonotope. Zonotopes Visualizing the entire design space
  27. d = 25 The projection of an k dimensional hypercube

    on a 2D plane results in a domain called a zonotope. Zonotopes Visualizing the entire design space
  28. Naturally, we want to fill in the blanks. Inverse maps

    Going from the low-dimensional space back to the high-dimensional one
  29. FULL SPACE (25 DIMENSIONS) Forward Map Inverse Map ? 1:1

    1:many EFFICIENCY ACTIVE SUBSPACE (2 DIMENSIONS) Given: minimize subject to 0T x 1  x  1 x 2 R25 y = MT x y = MT x Inverse maps
  30. EFFICIENCY ACTIVE SUBSPACE (2 DIMENSIONS) FULL SPACE (25 DIMENSIONS) Inverse

    Map Forward Map CAPACITY (1 DIMENSION) Flow capacity active variable Flow capacity CFD values MT x Inverse maps
  31. EFFICIENCY ACTIVE SUBSPACE (2 DIMENSIONS) FULL SPACE (25 DIMENSIONS) Inverse

    Map Forward Map CAPACITY (1 DIMENSION) Flow capacity active variable Flow capacity CFD values MT x This tells us that if we seek to increase the cruise efficiency, there will be a loss in maximum flow capacity at 105% shaft speed. *0.8% MRTP change between successive flow capacity contours Inverse maps For flow capacity
  32. Surface may be well approximated by a quadratic response —

    little noise. Most of the variation along the first eigenvector. *PR difference of 0.002 between successive contour lines Pressure ratio Quadratic approximation Inverse maps For pressure ratio
  33. EFFICIENCY ACTIVE SUBSPACE (2 DIMENSIONS) FULL SPACE (25 DIMENSIONS) Inverse

    Map Forward Map PRESSURE RATIO (2 DIMENSION) Inverse maps For pressure ratio
  34. EFFICIENCY ACTIVE SUBSPACE (2 DIMENSIONS) FULL SPACE (25 DIMENSIONS) Inverse

    Map Forward Map PRESSURE RATIO (2 DIMENSION) This tells us that if we seek to increase the cruise efficiency, there will be a loss in pressure ratio. Inverse maps For pressure ratio
  35. Decreasing cruise PR Decreasing max. climb capacity -0.005 +0.005 -0.010

    -0.015 -0.020 -0.025 Design cruise PR Datum design Putting the pieces together The entire design space in 2D
  36. Talk overview Potpourri of theory, methods and applications 1. Finding

    active subpsaces. 2. Exploiting active subspaces. 3. Ridge approximations. 4. Going beyond ridge approximations.
  37. f (x) ⇡ g MT x where: MT : Rd

    ! Rn g : Rn ! R Key questions: How do I compute ? What is ? What is the approximation error? M g Ridge approximations
  38. Friedman and Stuetzle (1981) Projection pursuit regression. Li (1991) Sliced

    inverse regression. Cook and Weisberg (1991) Sliced average variance estimation. Li (1992) Principal Hessian directions. Xia et al. (2002) Minimum average variance estimation. Li & Wing. (2002) Directional regression. Fornasier et al. (2012) Sparse ridge approximations. Tyagi & Cevher (2015) Low rank-strategies for ridges approximations. Constantine (2015) Active subspaces. Pinkus (2015) Generalized ridge functions. Liu & Guillas (2015) Gradient-based kernel dimension reduction. Hokanson & Constantine (2018) Polynomial variable projection. Literature: f (x) ⇡ g MT x where: MT : Rd ! Rn g : Rn ! R Ridge approximations
  39. Given, the function, its gradient vector and a given weight

    function: f = f (x) , x 2 Rd, rf (x) , ⇢ : Rd ! R + The average outer product of the gradient and its eigendecomposition: Constantine (2015) Active subspaces C = Z rxf (x) rxf (x)T ⇢ (x) dx = W⇤WT Eigenvalues measure ridge structure with the eigenvectors: i = Z wT i rf (x) 2 ⇢ (x) dx, i = 1, . . . , d. eigenvalue averaged, squared, directional derivative along eigenvector Active subspaces
  40. The eigenvalues measure the approximation error: Constantine (2015) Active subspaces

    f (x) E ⇥ f|WT 1 x ⇤ L2(⇢)  c ( n+1 + . . . + d)1/2 conditional expectation active subspace Poincaré constant In practice, for computing the active subspace we need gradient evaluations, which are not always available. Active subspaces
  41. f (x) ⇡ g MT x where: MT : Rd

    ! Rn g : Rn ! R Define the residual function: r (M) = 1 2 Z f (x) E ⇥ f|MT x ⇤ 2 ⇢ (x) dx Minimize the residual: best approximation* minimize M r (M) subject to M 2 St (m, d) Stiefel manifold of m-dimensional subspaces Ridge approximations
  42. Seshadri et al. (2018) Dimension reduction via Gaussian ridge functions

    Recipe: 1. Draw N training and M testing samples: 2. For the samples evaluate the qois: 3. Alternate between: i) ii) ˆ fj = f (ˆ xj) , ˜ fk = f (˜ xk) ˆ xj ⇠ ⇢, ˜ xk ⇠ ⇢ f ⇠ GP µ✓ MT x , k✓ MT x, MT x minimize ✓ log p MT ˆ x minimize M M X k=1 ⇣ ˜ fk µ✓ MT ˜ xk ⌘2 using testing data using training data hyperparameters Gaussian ridge approximations
  43. Seshadri et al. (2018) Dimension reduction via Gaussian ridge functions

    Set 150 training and 150 testing samples and use a squared exponential kernel. Project data onto the active subspace and set: u1 = mT 1 x u2 = mT 2 x first column vector of second column vector of M⇤ M⇤ 5 2 0 0 -5 -2 high low Gaussian ridge approximations x 2 R25
  44. important variables x = MMT x + NNT x unimportant

    variables where , to a better estimate for the posterior variance. fix these 5 2 0 0 -5 -2 high low Can generate additional samples by constraining the active coordinates Gaussian ridge approximations N = null MT
  45. important variables x = MMT x + NNT x unimportant

    variables where , to a better estimate for the posterior variance. Interestingly, this is also a recipe for investigating sensitivity to manufacturing tolerances. fix these Can generate additional samples by constraining the active coordinates Gaussian ridge approximations N = null MT Normalised efficiency (-) (+) PDF
  46. Talk overview Potpourri of theory, methods and applications 1. Finding

    active subpsaces. 2. Exploiting active subspaces. 3. Ridge approximations. 4. Going beyond ridge approximations.
  47. Whether a blade or an airfoil, in the narrative thus

    far, we do not really leverage the rich spatial information that CFD affords us. Our interest lies solely with certain output quantities of interest. But are we missing something? Many ridges After all, what’s better than one ridge approximation?
  48. Consider a vector-valued ridge function Many ridges After all, what’s

    better than one ridge approximation? Here each element could represent the isentropic Mach number distribution, or even each node within a CFD computation. where: MT : Rd ! Rn g : Rn ! R MT i gi f ⇡ 2 6 4 g1 MT 1 x . . . gm MT m x 3 7 5 for . i = 1, . . . , m
  49. We run a design of experiment with N=400 data-points with

    different 2D airfoil profiles at fixed boundary conditions. Each experiment is an evaluation of a RANS solver. Many ridges After all, what’s better than one ridge approximation? . . . x1 x2 x3 x4 x5 x6 x7 xN x 2 R50
  50. Local ridge approximation at node # 1 p1 p183 Local

    ridge approximation at node # 183 Embedded ridge approximations Fitting a ridge approximation at each node for static pressure MT 1 x MT 183 x
  51. Can build an entire flow-field by using the individual ridge

    subspaces and their profiles over all the nodes. Can generate a pressure flow-field for a new geometry. Embedded ridge approximation static pressure Truth static pressure Embedded ridge approximations Fitting a ridge approximation at each node for static pressure {M1, M2, . . . , Mm } , {g1, g2, . . . , gm }
  52. Can compress storage of flow-field by interpolating ridge approximations between

    neighboring nodes. Accurate recovery achieved by storing only 60% of the total # of nodes. Embedded ridge approximations Fitting a ridge approximation at each node for static pressure Stored nodes Reconstructed nodes
  53. Conclusions and outlook 1. Dimension reduction affords powerful inference and

    informs us which parameters are relevant. 2. While many ideas and algorithms exist in literature, there is a need for more — e.g., discrete parameters, strings, multiple subspaces. 3. Working closely with industry, academics need to “lift” these methods to real design problems. Developed workflows must be theoretically sound. 4. Interpretation remains a challenge! Subspaces need not conform to well-understood engineering parameterization.
  54. Papers A tale of two interpretations on dimension reduction 1.

    Scalar-valued dimension reduction Seshadri, Yuchi, Parks. (2020) “Dimension reduction via Gaussian ridge functions”, [accepted] SIAM/ ASA J. on Uncertainty Quantification (special section on approaches to reduce the dimension of model parameters). 2. Vector-valued dimension reduction Wong, Seshadri, Parks, Girolami. (2019) “Embedded ridge approximations: Constructing ridge approximations over scalar fields for improved simulation-centric dimension reduction”, [under review] SIAM/ASA J. on Uncertainty Quantification.
  55. Thank you! Pranay Seshadri University of Cambridge | The Alan

    Turing Institute www.psesh.com Research funded by Lloyds Register Foundation Programme for Data-Centric Engineering and UKRI Strategic Priorities Fund under EPSRC Grant P/T001569/1 and Rolls-Royce plc.