Data-Driven Dimension Reduction

DATA-DRIVEN DIMENSION REDUCTION Pranay Seshadri Associate Professor in Aerospace Engineering
Georgia Institute of Technology

DATA-DRIVEN DIMENSION REDUCTION ACKNOWLEDGEMENTS Much of the code was written
by Henry Yuchi, Chun Yui Wong, with further edits made by Ashley Scillitoe. We are grateful to Paul Constantine — for without him, this deck wouldn’t exist. FOLKS WHO HAVE CONTRIBUTED

DATA-DRIVEN DIMENSION REDUCTION INTRODUCTION The trouble is that high-dimensional problems
are inevitable and unavoidable. Thankfully, equadratures has a few utilities that may be useful. But before delving into them, it will be useful to talk about ideas like principal components analysis (PCA) and more generally the singular value decomposition. A FEW WORDS

DATA-DRIVEN DIMENSION REDUCTION OVERVIEW 1. PRINCIPAL COMPONENTS 2. A SIMPLE
EXPERIMENT 3. PROJECTIONS & ZONOTOPES 4. ACTIVE SUBSPACES 5. RIDGE APPROXIMATIONS

DATA-DRIVEN DIMENSION REDUCTION Image source: Rolls-Royce Flickr

DATA-DRIVEN DIMENSION REDUCTION PRINCIPAL COMPONENTS

DATA-DRIVEN DIMENSION REDUCTION diagonal matrix of singular values Number of
rows Singular values PRINCIPAL COMPONENTS

DATA-DRIVEN DIMENSION REDUCTION PRINCIPAL COMPONENTS

DATA-DRIVEN DIMENSION REDUCTION PRINCIPAL COMPONENTS Consider another scenario where we
have a matrix of measurements (say CMM data). Wish to determine which modes of manufacturing variability are the greatest. We create a covariance matrix using the observed data and then find its eigendecompostion. A FEW WORDS eigenvectors eigenvalues Observations Dimensions

DATA-DRIVEN DIMENSION REDUCTION PRINCIPAL COMPONENTS SUMMARISING THOUGHTS *Candés, Li, Ma,
Wright (2011) Robust principal component analysis? J. ACM. *Schölkopf, Smola, Müller (1997) Kernel principal component analysis. Springer. Rather than apply principal components to a single image, we can apply it to numerous images to find dominant linear directions across all the images. Utility is not restricted to images and videos, but to any matrix / vector database. More robust and nonlinear variants also exist. However, we can’t really use this if we are interested in input-output pairs. For instance, computer experiments are usually carried out via a uniform design of experiment. Naïvely applying PCA to the inputs does not facilitate output-driven dimension reduction.

DATA-DRIVEN DIMENSION REDUCTION 0.01 0.39 0.61 0.55 0.83 0.54 0.39
0.81 0.75 0.33 0.78 0.94 0.46 0.23 0.58 0.83 Generate 3 x 300 uniformly distribute random numbers between [0, 1]. For each set of numbers, evaluate . OUTPUT-BASED DIMENSION REDUCTION A SIMPLE EXPERIMENT

DATA-DRIVEN DIMENSION REDUCTION Generate 3 x 300 uniformly distribute random
numbers between [0, 1]. For each set of numbers, evaluate . Plot vs. for OUTPUT-BASED DIMENSION REDUCTION A SIMPLE EXPERIMENT

DATA-DRIVEN DIMENSION REDUCTION Generate 3 x 300 uniformly distribute random
numbers between [0, 1]. For each set of numbers, evaluate . Plot vs. for OUTPUT-BASED DIMENSION REDUCTION A SIMPLE EXPERIMENT We refer to as our dimension reducing subspace. sufficient summary plot!

DATA-DRIVEN DIMENSION REDUCTION Once this subspace is available, it is
relatively easy to construct a polynomial approximation of the form OUTPUT-BASED DIMENSION REDUCTION A SIMPLE EXPERIMENT Polynomial fit

DATA-DRIVEN DIMENSION REDUCTION OUTPUT-BASED DIMENSION REDUCTION ANALOGOUS TO A SINGLE
LAYER NEURAL NETWORK Hidden layer Inputs Output

DATA-DRIVEN DIMENSION REDUCTION OUTPUT-BASED DIMENSION REDUCTION ANALOGOUS TO A SINGLE
LAYER NEURAL NETWORK Hidden layer Inputs Output NOTE: In a classical neural network structure, we have no apriori notion of how many neurons we need per hidden layer and the relationship between them.

DATA-DRIVEN DIMENSION REDUCTION PROJECTIONS & ZONOTOPES SUBSPACE-BASED [LINEAR] PROJECTIONS Consider
the d-dimensional input observations to some computational model, i.e., .

DATA-DRIVEN DIMENSION REDUCTION PROJECTIONS & ZONOTOPES SUBSPACE-BASED [LINEAR] PROJECTIONS Human
A 3D object. Shadow A 2D projection of a 3D object. Be aware: understanding high- dimensional spaces is very important!

DATA-DRIVEN DIMENSION REDUCTION PROJECTIONS & ZONOTOPES SUBSPACE-BASED [LINEAR] PROJECTIONS Human
A 3D object. Shadow A 2D projection of a 3D object. Consider the projection of a cube on a plane, it can either be a square or hexagon. Be aware: understanding high- dimensional spaces is very important!

DATA-DRIVEN DIMENSION REDUCTION PROJECTIONS & ZONOTOPES SUBSPACE-BASED [LINEAR] PROJECTIONS Generate
random samples within a 𝖽 = 𝟥 dimensional cube and project using a random orthogonal projection. “Zonotope” Projection of hypercube on a plane

random samples within a 𝖽 = 𝟣𝟢 dimensional cube and project using a random orthogonal projection. “Zonotope” Projection of hypercube on a plane

random samples within a 𝖽 = 𝟧𝟢 dimensional cube and project using a random orthogonal projection. “Zonotope” Projection of hypercube on a plane

random samples within a 𝖽 = 𝟥𝟢𝟢 dimensional cube and project using a random orthogonal projection. “Zonotope” Projection of hypercube on a plane

By Serouj Ourishian - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=54972346
ACTIVE SUBSPACES A VISUALIZATION

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES Given a function, its gradient
vector, and a weight function

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES Given a function, its gradient
vector, and a weight function The average outer product of the gradient & its eigendecomposition is diagonal matrix of eigenvalues

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES Partition eigenvectors and eigenvalues eigenvalues
log-scale

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES Partition eigenvectors and eigenvalues Eigenvalues
measure ridge structure with eigenvectors averaged, squared, directional derivative along eigenvector eigenvalues log-scale

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES Non-dominant eigenvalues measure the approximation
error active subspace

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES Non-dominant eigenvalues measure the approximation
error conditional expectation active subspace Poincaré constant In practice, for computing the active subspace we require gradient evaluations (or approximations thereof) —> not always feasible.

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES Note that active subspaces is
NOT principal component analysis (PCA)! aside active subspaces PCA

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES This identification of the subspace
is important! Consider the following split

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES This identification of the subspace
is important! Consider the following split active subpace inactive subpace

DATA-DRIVEN DIMENSION REDUCTION ACTIVE SUBSPACES Constructs a quadratic global polynomial
model to estimate gradients. from equadratures import * space = Subspaces(method=‘active-subspace’, sample_points=X, sample_outputs=y) CODE

DATA-DRIVEN DIMENSION REDUCTION RIDGE APPROXIMATION Consider the task of approximating
with a polynomial expansion, i.e., solving This can be done via least squares. One can consider solving this in other norms, leading to solutions via compressed sensing, or some combination thereof. where

DATA-DRIVEN DIMENSION REDUCTION RIDGE APPROXIMATION Now, we consider the approximation
problem Hokanson & Constantine (2018). SIAM Journal on Scientific Computing. Golub & Pereyra (2003). Inverse Problems.

problem The key insight is to re-write this as

problem The key insight is to re-write this as which is equivalent to and only an optimisation problem over subspaces! matrix pseudoinverse

DATA-DRIVEN DIMENSION REDUCTION RIDGE APPROXIMATION ridge approximations •Require gradients or
gradient estimates (e.g., adjoints). •May require many model evaluations. •The “reduced” dimension is easy to gauge based on eigenvalue decay. active subspaces •Does not require gradients or gradient estimates (e.g., adjoints). •Requires # of model evaluations based on “reduced” dimension. •The “reduced” dimension may need to be estimated.

DATA-DRIVEN DIMENSION REDUCTION RIDGE APPROXIMATIONS CODE from equadratures import *
space = Subspaces(method=‘variable-projection’, sample_points=X, sample_outputs=y) M = space.get_subspace() subspace_poly = space.get_subspace_polynomial() sobol_indices = subspace_poly().get_sobol_indices() moments = subspace_poly().get_mean_and_variance() space.plot_2D_contour_zonotope()

LEARN CONTINUOUS STRUCTURE THANK YOU made by the equadratures team,
with ❤

Data-Driven Dimension Reduction

Data-Driven Dimension Reduction

More Decks by equadratures

Other Decks in Research

Featured

Transcript