Kernel mean embedding as a unifying theory for distributional data

KERNEL MEAN EMBEDDING AS A UNIFYING THEORY FOR COMPOSITIONAL DATA
Michiel Stock, Jasper De Landsheere & Daan Van Hauwermeiren Photo by Adrien Olichon on Unsplash KERMIT

DISTRIBUTIONS IN PHARMACEUTICAL PROCESSES 2 ConsiGma™ continuous powder-to-tablet line The
unit processes are often modelled using population balance modelling (PBM) by means of a balance equatio n d dt ∫ Ωx (t) dVx ∫ Ωr (t) dVr f(x, r, t) = ∫ Ωx (t) dVx ∫ Ωr (t) dVr h(x, r, Y, t) However, PBM are complex and slow, making them hard to use as process analytical technology in Industry 4.0 Need for ef fi cient (data and time) and general data- driven methods for distributions.

DISTRIBUTIONS ON A METRIC 3 We work with (discrete) probability
distributions where there is a metric (features, similarity…) associated with the objects. x histogram-like x1 x2 sample from distribution x1 x2 x3 bar-like K features similarity or kernel probability /density

DISTRIBUTIONS WITH A METRIC ARE EVERYWHERE 4 food compositions ecosystems
  (e.g. microbiome) cellular deconvolution

LEARNING PROBLEMS FOR DISTRIBUTIONS ON METRICS 5 compare distributions quantify
how similar distributions are representation PC1 PC2 generate numerical descriptors of distributions predict X → → y predict distributions from properties or vice versa inference get the distribution from a description

KERNEL TRICK 6 By replacing a dot product by a
kernel function, PCA, support vector machines, linear regression etc. become nonlinear methods. input space reproducing kernel Hilbert space

EXAMPLE OF EXTENDING THE FEATURE SPACE 7 ϕ : (x1
, x2 ) ↦ (x2 1 , 2x1 x2 , x2 2 ) ϕ ⟨ϕ(x), ϕ(x′ )⟩ℱ = x2 1 x1 ′ 2 + 2x1 x2 x′ 1 x′ 2 + x2 2 x′ 2 2 = (x1 x′ 1 + x2 x′ 2 )2 So the map is equivalent with the kernel k(x, x′ ) = ⟨x, x′ ⟩2 represent distributions by their first moment

KERNEL MEAN EMBEDDING 8 for universal kernels, the mapping is
injective and retains all information on the distribution the error on estimating the empirical map scales as (regularization might improve this) ||μℙ − ̂ μℙ || ℋ 𝒪 (1/ n)

JOINT AND CONDITIONAL DISTRIBUTIONS 9 One can model joint distributions
using the cross-covariance operator : 𝒞 YX := 𝔼 [φ(Y) ⊗ ϕ(X)] = μℙXY Joint embeddings allow for versions of Bayes’ rule and the sum rule in the RKHS

LEARNING PROBLEMS FOR DISTRIBUTIONS ON METRICS 10 compare distributions quantify
how similar distributions are representation PC1 PC2 generate numerical descriptors of distributions predict X → → y predict distributions from properties or vice versa inference get the distribution from a description maximum mean discrepancy (i.e. norm) ||μℙ − μℚ || ℋ kernel principal component analysi s (i.e. eigenvalues) svd(K) kernel ridge regressio n W = (K + λI)−1Φ fitting a parametrized distributio n min θ || ̂ μℙ − μℙθ ||2 ℋ +R(ℙθ )

KERNEL PCA ON COCKTAIL COMPOSITIONS 11 Data extracted from Liquid
Intelligence by Dave Arnold

KERNEL PCA TO MONITOR THE PHARMACEUTICAL PROCESS 12 # #
PC1 PC2 PC1 PC2

PREDICTING PHARMACEUTICAL DISTRIBUTION 13 Predict particle size distribution from machine
settings and powder composition . Dataset consists of 399 distributions with features . Goal: predict a histogram with 35 bins, logarithmically spaced between 8.46 and 6765.36 µm . We used a linear and RBF kernel for the process settings and blending ratios. These were combed by addition or element-wise multiplication . Evaluation was done using leave-one-out cross- validation with MSE, MMD and Kullback-Leibler as performance measures.

PREDICTION OF PARTICLE SIZE DISTRIBUTIONS 14

CONCLUSIONS & PROSPECTS 15 Michiel Stoc k Ugent postdo c
@michielstock   [email protected] https://michielstock.github.io/ Daan Van Hauwermeire n UGent postdoc/co-founder elegen t https://www.ele.gent/ Jasper De Landsheer e former UGent master student Kernel mean embedding is a powerful and general method for manipulating and modeling distributions. It is both extremely simple to implement (35 loc) and blazingly fast (1 second to fi t and validate).

Kernel mean embedding as a unifying theory for ...

Kernel mean embedding as a unifying theory for distributional data

Michiel Stock

More Decks by Michiel Stock

Featured

Transcript

KERNEL MEAN EMBEDDING AS A UNIFYING THEORY FOR COMPOSITIONAL DATA

DISTRIBUTIONS IN PHARMACEUTICAL PROCESSES 2 ConsiGma™ continuous powder-to-tablet line The

DISTRIBUTIONS ON A METRIC 3 We work with (discrete) probability

DISTRIBUTIONS WITH A METRIC ARE EVERYWHERE 4 food compositions ecosystems

LEARNING PROBLEMS FOR DISTRIBUTIONS ON METRICS 5 compare distributions quantify

KERNEL TRICK 6 By replacing a dot product by a

EXAMPLE OF EXTENDING THE FEATURE SPACE 7 ϕ : (x1

KERNEL MEAN EMBEDDING 8 for universal kernels, the mapping is

JOINT AND CONDITIONAL DISTRIBUTIONS 9 One can model joint distributions

LEARNING PROBLEMS FOR DISTRIBUTIONS ON METRICS 10 compare distributions quantify

KERNEL PCA ON COCKTAIL COMPOSITIONS 11 Data extracted from Liquid

KERNEL PCA TO MONITOR THE PHARMACEUTICAL PROCESS 12 # #

PREDICTING PHARMACEUTICAL DISTRIBUTION 13 Predict particle size distribution from machine

PREDICTION OF PARTICLE SIZE DISTRIBUTIONS 14

CONCLUSIONS & PROSPECTS 15 Michiel Stoc k Ugent postdo c