Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kernel mean embedding as a unifying theory for distributional data

Michiel Stock
July 16, 2022
110

Kernel mean embedding as a unifying theory for distributional data

Michiel Stock

July 16, 2022
Tweet

Transcript

  1. KERNEL MEAN EMBEDDING AS A UNIFYING THEORY FOR COMPOSITIONAL DATA

    Michiel Stock, Jasper De Landsheere & Daan Van Hauwermeiren Photo by Adrien Olichon on Unsplash KERMIT
  2. DISTRIBUTIONS IN PHARMACEUTICAL PROCESSES 2 ConsiGma™ continuous powder-to-tablet line The

    unit processes are often modelled using population balance modelling (PBM) by means of a balance equatio n d dt ∫ Ωx (t) dVx ∫ Ωr (t) dVr f(x, r, t) = ∫ Ωx (t) dVx ∫ Ωr (t) dVr h(x, r, Y, t) However, PBM are complex and slow, making them hard to use as process analytical technology in Industry 4.0 Need for ef fi cient (data and time) and general data- driven methods for distributions.
  3. DISTRIBUTIONS ON A METRIC 3 We work with (discrete) probability

    distributions where there is a metric (features, similarity…) associated with the objects. x histogram-like x1 x2 sample from distribution x1 x2 x3 bar-like K features similarity or kernel probability /density
  4. LEARNING PROBLEMS FOR DISTRIBUTIONS ON METRICS 5 compare distributions quantify

    how similar distributions are representation PC1 PC2 generate numerical descriptors of distributions predict X → → y predict distributions from properties or vice versa inference get the distribution from a description
  5. KERNEL TRICK 6 By replacing a dot product by a

    kernel function, PCA, support vector machines, linear regression etc. become nonlinear methods. input space reproducing kernel Hilbert space
  6. EXAMPLE OF EXTENDING THE FEATURE SPACE 7 ϕ : (x1

    , x2 ) ↦ (x2 1 , 2x1 x2 , x2 2 ) ϕ ⟨ϕ(x), ϕ(x′  )⟩ℱ = x2 1 x1 ′  2 + 2x1 x2 x′  1 x′  2 + x2 2 x′  2 2 = (x1 x′  1 + x2 x′  2 )2 So the map is equivalent with the kernel k(x, x′  ) = ⟨x, x′  ⟩2 represent distributions by their first moment
  7. KERNEL MEAN EMBEDDING 8 for universal kernels, the mapping is

    injective and retains all information on the distribution the error on estimating the empirical map scales as (regularization might improve this) ||μℙ − ̂ μℙ || ℋ 𝒪 (1/ n)
  8. JOINT AND CONDITIONAL DISTRIBUTIONS 9 One can model joint distributions

    using the cross-covariance operator : 𝒞 YX := 𝔼 [φ(Y) ⊗ ϕ(X)] = μℙXY Joint embeddings allow for versions of Bayes’ rule and the sum rule in the RKHS
  9. LEARNING PROBLEMS FOR DISTRIBUTIONS ON METRICS 10 compare distributions quantify

    how similar distributions are representation PC1 PC2 generate numerical descriptors of distributions predict X → → y predict distributions from properties or vice versa inference get the distribution from a description maximum mean discrepancy (i.e. norm) ||μℙ − μℚ || ℋ kernel principal component analysi s (i.e. eigenvalues) svd(K) kernel ridge regressio n W = (K + λI)−1Φ fitting a parametrized distributio n min θ || ̂ μℙ − μℙθ ||2 ℋ +R(ℙθ )
  10. PREDICTING PHARMACEUTICAL DISTRIBUTION 13 Predict particle size distribution from machine

    settings and powder composition . Dataset consists of 399 distributions with features . Goal: predict a histogram with 35 bins, logarithmically spaced between 8.46 and 6765.36 µm . We used a linear and RBF kernel for the process settings and blending ratios. These were combed by addition or element-wise multiplication . Evaluation was done using leave-one-out cross- validation with MSE, MMD and Kullback-Leibler as performance measures.
  11. CONCLUSIONS & PROSPECTS 15 Michiel Stoc k Ugent postdo c

    @michielstock 
 [email protected] https://michielstock.github.io/ Daan Van Hauwermeire n UGent postdoc/co-founder elegen t https://www.ele.gent/ Jasper De Landsheer e former UGent master student Kernel mean embedding is a powerful and general method for manipulating and modeling distributions. It is both extremely simple to implement (35 loc) and blazingly fast (1 second to fi t and validate).