Upgrade to Pro — share decks privately, control downloads, hide ads and more …

EM algorithm

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

EM algorithm

Tweet

More Decks by Yuichiro Tachibana (Tsuchiya)

Other Decks in Science

Transcript

  1. Example: Gaussian Mixture Models Given data points… = =1 (|

    , ) : Mixing coefficient (| , ): Normal distribution or k-th component of GMM • : Mean of k-th component • : Variance-covariance matrix of k-th component
  2. How to fit GMM to given data points? = =1

    (| , ) Parameters to be determined: , , Maximum likelihood estimation Maximize a log likelihood function ln () ln , , = =1 ln =1 ( | , ) In this case: ( is a set of given data) Likelihood function: Bayes’ theorem: ∝ | ∝ ℎ × | = =1 ( |) in this case: | , = =1 ( | , ) = =1 =1 ( | , )
  3. Solve maximum likelihood estimation ln , , = =1 ln

    =1 (| , ) ln , , = 0 ln , , = 0 ln , , = 0 = 1 =1 ( ) = 1 =1 − − = = =1 ( ) where = (| , ) =1 (| , ) Responsibility that component takes for ‘explaining’ the observation
  4. Optimization = 1 =1 ( ) = 1 =1 −

    − = = =1 ( ) where = (| , ) =1 (| , ) , , and depend on each other No closed-form solution
  5. Optimization = 1 =1 ( ) = 1 =1 −

    − = = =1 ( ) where = (| , ) =1 (| , ) , , and depend on each other No closed-form solution
  6. EM algorithm of GMM 1. Choose initial values for ,

    , 2. Evaluate by using , , =E (expectation) step 3. Re-estimate , , by using =M (maximization) step = 1 =1 ( ) = 1 =1 − − = = =1 ( ) where = (| , ) =1 (| , ) No closed-form solution ※ Each update to the parameters resulting from an E step followed by an M step is guaranteed to increase the log likelihood function. (PRML Sec. 9.4) 4. Stop if the log likelihood or parameters converge
  7. Redefine GMM: Introduce latent variable Given data points… = =1

    (| , ) : Mixing coefficient (| , ): Normal distribution or k-th component of GMM • : Mean of k-th component • : Variance-covariance matrix of k-th component
  8. Redefine GMM: Introduce latent variable = =1 (| , )

    Introduce latent variable = 1 = ( = 1 (Probability)) = =1 = 1 = (| , ) latent variable is… • = 1 , 2 , ⋯ , • ∈ 0,1 • = 1 (1-of-K representation) (Assign just 1 component) another representation = =1 (| , )
  9. 3 2 Redefine GMM: Introduce latent variable = 1 =

    = 1 = (| , ) Example: How to consider a generative model of GMM with latent variable (K=3) 1. Probabilistically select = 0, 1, 0 2. Generate a data point from a selected distribution by ~( , ) → component 2 1
  10. 3 2 Redefine GMM: Introduce latent variable = 1 =

    = 1 = (| , ) Example: How to consider a generative model of GMM with latent variable (K=3) 1. Probabilistically select = 1, 0, 0 2. Generate a data point from a selected distribution by ~( , ) → component 1 1
  11. 3 2 Redefine GMM: Introduce latent variable = 1 =

    = 1 = (| , ) Example: How to consider a generative model of GMM with latent variable (K=3) 1. Probabilistically select 2. Generate a data point from a selected distribution by 1
  12. Alternative view of EM algorithm with latent variables Goal of

    EM algorithm: To find maximum likelihood solutions for models having latent variables Observed data: = ( , , ⋯ , ) All latent variables: = ( , , ⋯ , ) All model parameters: ln | = ln , | Log likelihood function
  13. General EM algorithm 1. Choose an initial setting for the

    parameters 2. E step: Evaluate (|, ) 3. M step: Evaluate Given by = argmax (, ) where , = (|, ) ln (, |) 4. Check for convergence of either the log likelihood or the parameter values. If the convergence criterion is not satisfied, then let ← and return to step 2. ln | = ln , | Expectation of ln (, |) under the posterior distribution of Complex solution • (, |) can be solved straightforward • But we can not obtain the latent variable → Maximize the expectation
  14. Revisit GMM = 1 = = 1 = (| ,

    ) = =1 = =1 (| , ) = 1 ⋮
  15. Solve GMM E step: Evaluate (|, ) M step: Evaluate

    Given by = argmax (, ) where , = (|, ) ln (, |) = =1 = =1 (| , ) GMM: Complete data likelihood function: , , , = =1 =1 ( | , ) Maximizing: • With respect to , → same to single Gaussian • With respect to → = 1 =1 The closed-form solution can be obtained ln , , , = =1 =1 ln + ln ( | , ) log:
  16. Solve GMM ∝ (|) Bayes’ theorem , , , ∝

    =1 =1 ( | , ) = =1 = =1 (| , ) GMM: E step: Evaluate (|, ) M step: Evaluate Given by = argmax (, ) where , = (|, ) ln (, |) = ′ ′( |′, ′) ′ ( | , ) = ⋯ = ( ) ln , , , = =1 =1 ( ) ln + ln ( | , ) = (| , ) =1 (| , ) cf) Responsibility
  17. Solve GMM Now we have… ln , , , =

    =1 =1 ( ) ln + ln ( | , ) = (| , ) =1 (| , ) E step: Evaluate (|, ) M step: Evaluate Given by = argmax (, ) where , = (|, ) ln (, |) E step: Evaluate the responsibilities with , , M step: Keep responsibilities fixed and maximize ln , , , with respect to , , (the derivative=0)
  18. Analogy: k-means EM algorithm of GMM k-means = 1 =1

    ( ) = 1 =1 − − = E step: Finding centroids M step: maximize ln , , , with respect to , , Re-assignment data points to centroids to minimize distances between centroids and assigned data points = → 0 = : assignment variable = 1 = argmin − 2 0 ℎ =1 =1 − 2 (⇔data point is assigned to centroid )
  19. Another applications of EM algorithm Mixtures of Bernoulli distributions binary

    variables ( = 1, … , ) = =1 (1 − )(1−) (Bernoulli distribution with parameter ) = 1 , 2 , … , = 1 , 2 , … , Let’s consider a finite mixture of these distributions: , = =1 , = =1 (1 − )(1−) = 1 , 2 , … , = 1 , 2 , … ,
  20. Mixtures of Bernoulli distributions , = =1 , = =1

    (1 − )(1−) = 1 , 2 , … , = 1 , 2 , … , Let’s determine parameters when data points = , , … , given Maximize the log likelihood function ln , = =1 ln =1 No closed-form solution… →EM algorithm!
  21. Introduce latent variables Introduce latent variable latent variable is… •

    = 1 , 2 , ⋯ , • ∈ 0,1 • = 1 , = =1 | = =1
  22. EM algorithm for mixtures of Bernoulli distributions E step: Evaluate

    (|, ) M step: Evaluate Given by = argmax (, ) where , = (|, ) ln (, |) ln , , = =1 =1 ( ) ln + =1 ln + (1 − ) ln(1 − ) = = =1 ln , , = =1 =1 ln + =1 ln + (1 − ) ln(1 − ) Expectation of ln , , under the posterior distribution of
  23. EM algorithm for mixtures of Bernoulli distributions Now we have…

    E step: Evaluate the responsibilities with , M step: Keep responsibilities fixed and maximize ln , , with respect to , (the derivative=0) E step: Evaluate (|, ) M step: Evaluate Given by = argmax (, ) where , = (|, ) ln (, |) ln , , = =1 =1 ( ) ln + =1 ln + (1 − ) ln(1 − ) = = =1
  24. Conclusion Goal of EM algorithm is: To find maximum likelihood

    solutions for models having latent variables For example: • Gaussian Mixture Models • Mixtures of Bernoulli distributions If you want to learn more, see PRML Sec. 9