Kernel mean embedding as a unifying theory for distributional data

Slide 1

Slide 1 text

KERNEL MEAN EMBEDDING AS A UNIFYING THEORY FOR COMPOSITIONAL DATA Michiel Stock, Jasper De Landsheere & Daan Van Hauwermeiren Photo by Adrien Olichon on Unsplash KERMIT

Slide 2

Slide 2 text

DISTRIBUTIONS IN PHARMACEUTICAL PROCESSES 2 ConsiGma™ continuous powder-to-tablet line The unit processes are often modelled using population balance modelling (PBM) by means of a balance equatio n d dt ∫ Ωx (t) dVx ∫ Ωr (t) dVr f(x, r, t) = ∫ Ωx (t) dVx ∫ Ωr (t) dVr h(x, r, Y, t) However, PBM are complex and slow, making them hard to use as process analytical technology in Industry 4.0 Need for ef fi cient (data and time) and general data- driven methods for distributions.

Slide 3

Slide 3 text

DISTRIBUTIONS ON A METRIC 3 We work with (discrete) probability distributions where there is a metric (features, similarity…) associated with the objects. x histogram-like x1 x2 sample from distribution x1 x2 x3 bar-like K features similarity or kernel probability /density

Slide 4

Slide 4 text

DISTRIBUTIONS WITH A METRIC ARE EVERYWHERE 4 food compositions ecosystems   (e.g. microbiome) cellular deconvolution

Slide 5

Slide 5 text

LEARNING PROBLEMS FOR DISTRIBUTIONS ON METRICS 5 compare distributions quantify how similar distributions are representation PC1 PC2 generate numerical descriptors of distributions predict X → → y predict distributions from properties or vice versa inference get the distribution from a description

Slide 6

Slide 6 text

KERNEL TRICK 6 By replacing a dot product by a kernel function, PCA, support vector machines, linear regression etc. become nonlinear methods. input space reproducing kernel Hilbert space

Slide 7

Slide 7 text

EXAMPLE OF EXTENDING THE FEATURE SPACE 7 ϕ : (x1 , x2 ) ↦ (x2 1 , 2x1 x2 , x2 2 ) ϕ ⟨ϕ(x), ϕ(x′ )⟩ℱ = x2 1 x1 ′ 2 + 2x1 x2 x′ 1 x′ 2 + x2 2 x′ 2 2 = (x1 x′ 1 + x2 x′ 2 )2 So the map is equivalent with the kernel k(x, x′ ) = ⟨x, x′ ⟩2 represent distributions by their first moment

Slide 8

Slide 8 text

KERNEL MEAN EMBEDDING 8 for universal kernels, the mapping is injective and retains all information on the distribution the error on estimating the empirical map scales as (regularization might improve this) ||μℙ − ̂ μℙ || ℋ 𝒪 (1/ n)

Slide 9

Slide 9 text

JOINT AND CONDITIONAL DISTRIBUTIONS 9 One can model joint distributions using the cross-covariance operator : 𝒞 YX := 𝔼 [φ(Y) ⊗ ϕ(X)] = μℙXY Joint embeddings allow for versions of Bayes’ rule and the sum rule in the RKHS

Slide 10

Slide 10 text

LEARNING PROBLEMS FOR DISTRIBUTIONS ON METRICS 10 compare distributions quantify how similar distributions are representation PC1 PC2 generate numerical descriptors of distributions predict X → → y predict distributions from properties or vice versa inference get the distribution from a description maximum mean discrepancy (i.e. norm) ||μℙ − μℚ || ℋ kernel principal component analysi s (i.e. eigenvalues) svd(K) kernel ridge regressio n W = (K + λI)−1Φ fitting a parametrized distributio n min θ || ̂ μℙ − μℙθ ||2 ℋ +R(ℙθ )

Slide 11

Slide 11 text

KERNEL PCA ON COCKTAIL COMPOSITIONS 11 Data extracted from Liquid Intelligence by Dave Arnold

Slide 12

Slide 12 text

KERNEL PCA TO MONITOR THE PHARMACEUTICAL PROCESS 12 # # PC1 PC2 PC1 PC2

Slide 13

Slide 13 text

PREDICTING PHARMACEUTICAL DISTRIBUTION 13 Predict particle size distribution from machine settings and powder composition . Dataset consists of 399 distributions with features . Goal: predict a histogram with 35 bins, logarithmically spaced between 8.46 and 6765.36 µm . We used a linear and RBF kernel for the process settings and blending ratios. These were combed by addition or element-wise multiplication . Evaluation was done using leave-one-out cross- validation with MSE, MMD and Kullback-Leibler as performance measures.

Slide 14

Slide 14 text

PREDICTION OF PARTICLE SIZE DISTRIBUTIONS 14

Slide 15

Slide 15 text

CONCLUSIONS & PROSPECTS 15 Michiel Stoc k Ugent postdo c @michielstock   michiel.stock@ugent.be https://michielstock.github.io/ Daan Van Hauwermeire n UGent postdoc/co-founder elegen t https://www.ele.gent/ Jasper De Landsheer e former UGent master student Kernel mean embedding is a powerful and general method for manipulating and modeling distributions. It is both extremely simple to implement (35 loc) and blazingly fast (1 second to fi t and validate).