Introduction to GPVLM - Speaker Deck

Slide 1

Slide 1 text

Gaussian Process Latent Variable Model [1] Neil Lawrence. Gaussian process latent variable models for visualization of high dimensional data. [2] Neil Lawrence. Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models.

Slide 2

Slide 2 text

Theory ・What Gaussian Process Regression is ・Gaussian Process Latent Variable Examples ・Easy experiment in Oil flow dataset with source code ・GPLVM as a generation model ・Phase transition related to hyper parameters of GP Related Models ・Infinite Warped Mixture Model, ・Gaussian Process Dynamical Model, GPDM Contents

Slide 3

Slide 3 text

What Gaussian Process Regression is. Dataset = , | = 1 ⋯ = = ෍ =1 ( ) Estimated by using MSE Φ, = ( ) : design matrix = ΦΦ −1Φ Linear Regression ( basis function (⋅) have to be given) Gaussian Process Regression ( Kernel function have to be given) We introduce prior distribution ~N , λ2 = follows gaussian distribution ~ , λ2 ≡ , (∗|∗, )~ ∗ T−1, ∗∗ − ∗ T−1∗ ∗ ~N , ∗ ∗ T k∗∗ ∗ = ∗, 1 , ⋯ , (∗, ) k∗∗ = ∗, ∗ ,′ = λ ′ = 1 exp − 1 2 − ′ 2 Example: RBF Kernel

Slide 4

Slide 4 text

We use the idea of gaussian process regression as a unsupervised learning.

Slide 5

Slide 5 text

Introduction of GPLVM = , | = 1 ⋯ unknown given 1 = 1 (1) 1 (2) ⋮ 1 () 2 = 2 (1) 2 (2) ⋮ 2 () = (1) (2) ⋮ () = 1 (1) 1 (2) 2 (1) 2 (2) ⋯ 1 () ⋯ 2 () ⋮ ⋮ (1) (2) ⋱ ⋮ ⋯ () ∈ ℝ× () = 1 () 2 () ⋮ () is generated by common unknown N inputs = , ⋯ , by gaussian process regression. ()~(, + 2) = 1, ⋯ ,

Slide 6

Slide 6 text

Introduction of GPLVM ()~(0, + 2) How should know ? Let us call latent variable. , = () = ෑ =1 () () = ෑ =1 () , + 2 () = ෑ =1 () , + 2 ෑ =1 ( ) = ෑ =1 () , + 2 ෑ =1 ( |0, ) Let us find X which maximize it No reason. cf. manifold hypothesis

Slide 7

Slide 7 text

Introduction of GPLVM , = ෑ =1 () , + 2 ෑ =1 ( |0, ) = ෑ =1 1 2 /2 1/2 exp − 1 2 T −1() ෑ =1 ( |0, ) = 1 2 /2 /2 exp − 1 2 ෍ =1 T −1() ෑ =1 ( |0, ) = 1 2 /2 /2 exp − 1 2 tr −1T ෑ =1 ( |0, ) Inner product of matrix and is tr T tr T is larger as = When , is large, −1 is similar to Correlation matrix of observed data T

Slide 8

Slide 8 text

Example of GPLVM Let us experiment in oil flow dataset Tst = Test Vdn = Valid Trn = Train [1 0 0] [0 1 0] [1 0 0] 2 energy γ-ray 2 energy × 6 direction = 12 dimensional data

Slide 9

Slide 9 text

Example of GPLVM Easy to run by GPy! Compare to PCA You can know confidence!! or GPythorch, TF Probability… (∗|∗, )~ ∗ T−, ∗∗ − ∗ T−∗

Slide 10

Slide 10 text

Example of application of GPLVM 2 dimensional embedded latent space Scaled GPLVM From Style-Based Inverse Kinematics 2004

Slide 11

Slide 11 text

GPLVM as a generation model

Slide 12

Slide 12 text

What is the difference between GPLVM and VAE? feature space Decoded Data Decoded Data = Decoder() Trained VAE GPLVM Each point correspond to a decode sample. Each point correspond to gaussian distribution. We can extract data by sampling the distribution. sampling unique unique not unique 1 2 … You cannot know confidence in feature space. It might be overfitted. You can know confidence in latent space. It will not be overfitted. latent space (0, I) (0, I) (∗|∗, )~ ∗ T−, ∗∗ − ∗ T−∗

Slide 13

Slide 13 text

Infinite Warped Mixture Model, iWMM , = ()= ෑ =1 () , + 2 ෑ =1 ( |0, ) We assume explicitly the number of clusters in the latent space. , = ()= ෑ =1 () , + 2 ෑ =1 ෍ =1 λ ( | , −1) GMM Warped Mixtures for Nonparametric Cluster Shapes(2013) Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani It is not easy to run. We can find MATLAB code in GitHub.

Slide 14

Slide 14 text

Gaussian Process Dynamical Model, GPDM = =1 , =2 , ⋯ , = = =1 , =2 , ⋯ , = Observed variable Latent variable time developing time developing , = () = ෑ =1 () ෑ =2 ( |−1 ) −1 −1 +1 +1 GPVLM GPDM From Gaussian Process Dynamical Models 2008

Slide 15

Slide 15 text

Conclusion and Discussion We can use gaussian process in unsupervised learning as GPLVM - Dimensional reduction - Clustering - Actually, GPLVM is generalized method of probabilistic PCA and Kernel PCA - Actually, Bayesian GPLVM is popular (link) We can use GPLVM as generation model - I will not be overfitted. - We can see confidence of latent space . There are some advanced model - Infinite Warped Mixture Model, iWMM - Gaussian Process Dynamical Model, GPDM - Discriminative Gaussian Process Latent Variable Model, discriminative GPLVM (link) - Supervised Latent Linear Gaussian Process Latent Variable Model, SLLGPLVM (link) Research Topics - Computational complexity is 3 . We have to calculate inverse matrix −1 - Analytical discussion of Generalization Gap of Gaussian Process (link) Data Augment from GPLVM?