Slide 1

Slide 1 text

Online estimation of ECD and its Mixture Component-wise Information Gradient method ZHOU Jialun 1 1IMS laboratory, University of Bordeaux May 19, 2021

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Estimation problem ˆ θ = arg min θ D(θ) (1) ˆ θk+1 = F(ˆ θk , ∇θ D(θ)) (2) Iterative algorithm 1 Offline version, e.g. Expectation Maximisation 1. 2 Online version, e.g. Stochastic Gradient Descent 2. Difficulties Offline methods → Memory and time Classic stochastic approximation → Non-convexity, instability and step size 1dempster1977maximum 2saad1998online

Slide 4

Slide 4 text

Optimization on Riemannian Manifolds Absil. ”Optimization algorithms on matrix manifolds”. 2009. Bonnabel Silvere. ”Stochastic gradient descent on Riemannian manifolds”. 2013. Amari Shun-Ichi. ”Natural gradient works efficiently in learning”. 1998. Component-wise Information Gradient Save memory and time Easy to select step size Improve performance

Slide 5

Slide 5 text

Contents 1. ECD Introduction and problematic Riemannian Geometry Optimization on Riemannian manifold Component-wise Information Metric and gradient Retraction map Algorithms Global convergence CIG for ECD 2. Mixture of ECD ZHOU Jialun Online estimation of ECD and its Mixture 5 / 39

Slide 6

Slide 6 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms ECD Density of ECD p(x|θ) = c(β) |Σ|−1/2 gβ (x − µ)†Σ−1(x − µ) (3) where gβ is the generator function . ZHOU Jialun Online estimation of ECD and its Mixture 6 / 39

Slide 7

Slide 7 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Parametric space Parametric space µ ∈ Rm, Σ ∈ Pm , β ∈ R+. Cost function D(θ) = E [q(θ; x)] where q = − log p(θ; x) (4) ZHOU Jialun Online estimation of ECD and its Mixture 7 / 39

Slide 8

Slide 8 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms General update formula General update formula on Riemannian manifold θn+1 = Retθn (γn+1 u(θn , xn+1 )) n = 0, 1, · · · (5) ZHOU Jialun Online estimation of ECD and its Mixture 8 / 39

Slide 9

Slide 9 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Cimponent-wise Information metric Difficulty Information metric does not have closed form 3 Solution: Component-wise Information Metric 3verdoolaege2012geometry ZHOU Jialun Online estimation of ECD and its Mixture 9 / 39

Slide 10

Slide 10 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Information metric for Σ Information metric for Σ The analytical expression of ·, · Σ 4. UΣ , VΣ Σ = IΣ,1 tr Σ−1 UΣ Σ−1 VΣ + IΣ,2 tr Σ−1 UΣ tr Σ−1 VΣ (6) 4berkane1997geodesic ZHOU Jialun Online estimation of ECD and its Mixture 10 / 39

Slide 11

Slide 11 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Information metric for µ and β Information metric for µ The analytical expression of ·, · µ 5. Uµ , Vµ µ = Iµ U† µ Σ−1 Vµ (7) Information metric for β Uβ , Vβ β = Iβ Uβ Vβ (8) 5zhou2020riemannian ZHOU Jialun Online estimation of ECD and its Mixture 11 / 39

Slide 12

Slide 12 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Component-wise Information Gradient Component-wise Information Gradient The solution of the following equation is the component-wise information gradient ∇θ q(θ; x), v θ = dq(θ; x)[v] (9) And its abstract expression is given as ∇θ q(θ; x) =   ∇µ q(θ; x) ∇Σ q(θ; x) ∇β q(θ; x)   (10) ZHOU Jialun Online estimation of ECD and its Mixture 12 / 39

Slide 13

Slide 13 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Retraction on parametric space Retraction map Retθ(n) η(n)∇θ q(θ(n); X) =    Exp µ (n) (η(n)∇µ q(θ(n); X)) Exp Σ (n) (η(n)∇Σ q(θ(n); X)) Exp β (n) (η(n)∇β q(θ(n); X))    (11) ZHOU Jialun Online estimation of ECD and its Mixture 13 / 39

Slide 14

Slide 14 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Global convergence Geodesically α-Strongly convexity A function D : Θ → R is said to be geodesically α-strongly convex if for any θ1 , θ2 ∈ Θ, D(θ2 ) D(θ1 ) + grad θ1 D(θ1 ), Exp−1 θ1 (θ2 ) θ1 + α 2 d2(θ1 , θ2 ) (12) The following cases, the global convergence is guaranteed, i.e. ∀θ0 ∈ Θ, we always have lim θn = θ∗. Student-T6 θ = (Σ), for β∗ > −m θ = (µ, Σ), for β∗ > 0 MGGD7 θ = (Σ), for β∗ > 0 θ = (µ, Σ), for β∗ > 0 6laus2019multivariate 7zhang2013multivariate ZHOU Jialun Online estimation of ECD and its Mixture 14 / 39

Slide 15

Slide 15 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms CIG with deterministic step-size Reformulated objective function ˆ D(θ) = 1 T T t=1 q(θ; x(t)) (13) CIG with deterministic step-size 8 1 Update µ: Calculate ∇µ ˆ D(θ) according to (µ(n), Σ(n), β(n)); Select η(n) µ according to Armijo-Goldstein criteria and update µ(n+1); 2 Update Σ: Calculate ∇Σ ˆ D(θ) according to (µ(n+1), Σ(n), β(n)); Select η(n) Σ according to Armijo-Goldstein criteria and update Σ(n+1); 3 Update β: Calculate ∇β ˆ D(θ) according to (µ(n+1), Σ(n+1), β(n)); Select η(n) β according to Armijo-Goldstein criteria and update β(n+1); 8zhou2020riemannian ZHOU Jialun Online estimation of ECD and its Mixture 15 / 39

Slide 16

Slide 16 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms CIG with deterministic step size (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ Figure 1: The convergence of CIG with deterministic step-size for ECDThe step size is selected according to the Armijo-Goldstein criteria The error is measured by the component-wise information distance (22). ZHOU Jialun Online estimation of ECD and its Mixture 16 / 39

Slide 17

Slide 17 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms CIG with decreasing step size CIG with decreasing step size9 1 η(n+1) ← a n+1 with a > 0; 2 Update µ: Calculate ∇µ q(θ; X) according to (µ(n), Σ(n), β(n)); Update µ(n+1) with η(n+1); 3 Update Σ: Calculate ∇Σ q(θ; X) according to (µ(n+1), Σ(n), β(n)); Update Σ(n+1) with η(n+1); 4 Update β: Calculate ∇β q(θ; X) according to (µ(n+1), Σ(n+1), β(n)); Update β(n+1) with η(n+1); 9zhou2020riemannian ZHOU Jialun Online estimation of ECD and its Mixture 17 / 39

Slide 18

Slide 18 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Theorical results Theorical results For general case θ = (µ, Σ, β) 1 Convergence. 2 Mean-square rate, ∀a > 1 2λ , E d2(θn , θ∗) = O(n−1). 3 Asymptotic normality. For θ = (Σ) and θ = (µ, Σ), CIM = IM 1 ∀a > 1 2 , the mean square rate holds. 2 If a = 1, the estimates θ(n) are asymptotically efficient. ZHOU Jialun Online estimation of ECD and its Mixture 18 / 39

Slide 19

Slide 19 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Convergence rate for decreasing step size (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β) Figure 2: Convergence rate, with a > 1 2λ Recall the mean square rate: ∃K > 0, ∃n0 > 0, such that, d2(θ(n), θ∗) K n (14) ZHOU Jialun Online estimation of ECD and its Mixture 19 / 39

Slide 20

Slide 20 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Asymptotic normality (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ Figure 3: Asymptotic normality The formal explication is in (25). ZHOU Jialun Online estimation of ECD and its Mixture 20 / 39

Slide 21

Slide 21 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Efficiency (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β) (d) summary Figure 4: Efficiency ZHOU Jialun Online estimation of ECD and its Mixture 21 / 39

Slide 22

Slide 22 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Time consumption (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β) (d) summary Figure 5: Time consumption ZHOU Jialun Online estimation of ECD and its Mixture 22 / 39

Slide 23

Slide 23 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Color transformation Input Target MM FP CIG Offline CIG Online Figure 6: Full HD image with 5D transformation10 10Hristova, Hristina, ”Transformation of the multivariate generalized Gaussian distribution for image editing”. 2017. ZHOU Jialun Online estimation of ECD and its Mixture 23 / 39

Slide 24

Slide 24 text

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms Color transformation MM FP CIG Offline CIG Online Figure 7: Details of transformation ZHOU Jialun Online estimation of ECD and its Mixture 24 / 39

Slide 25

Slide 25 text

ECD Mixture of ECD Mixture of ECD Density function f(x; θ) = K k=1 wk p(x; µk , Σk , βk ) with K k=1 wk = 1 (15) Parameter and Parametric space θ =     [w1 , · · · , wK ] ,     µ(1) 1 · · · µ(1) K . . . ... . . . µ(m) 1 · · · µ(m) K     ,     Σ(1) 1 · · · Σ(1) K . . . ... . . . Σ(d) 1 · · · Σ(d) K     , [β1 , · · · , βK ]     (16) Θ = (0, 1)K × Rm×K × PK m × RK + (17) ZHOU Jialun Online estimation of ECD and its Mixture 25 / 39

Slide 26

Slide 26 text

ECD Mixture of ECD Unit sphere Geometry structures on unit sphere Cost function D(θ) = E q(θ; x) = −E log f(θ; x) (18) ZHOU Jialun Online estimation of ECD and its Mixture 26 / 39

Slide 27

Slide 27 text

ECD Mixture of ECD CIM for MECD CIM for MECD uθ , vθ θ = ur , vr + K k=1 uµk , vµk µk + uΣk , vΣk Σk + uβk , vβk βk (19) ZHOU Jialun Online estimation of ECD and its Mixture 27 / 39

Slide 28

Slide 28 text

ECD Mixture of ECD CIG with decreasing step-size CIG with decreasing step-size 1 η(n+1) ← a n+1 ; 2 Update weights Update r(n+1) with η(n+1); 3 Update (µk , Σk , βk )k : Update µ(n+1) k according to (r(n+1), µ(n) k , Σ(n) k , β(n) k ) Update Σ(n+1) k according to (r(n+1), µ(n+1) k , Σ(n) k , β(n) k ) Update β(n+1) k according to (r(n+1), µ(n+1) k , Σ(n+1) k , β(n) k ) ZHOU Jialun Online estimation of ECD and its Mixture 28 / 39

Slide 29

Slide 29 text

ECD Mixture of ECD Variant step size Convergence rate If ∀a > 1 2λ , the mean square rate holds. Online backtracking step size ∀n, η(n) is selected according to a mini-batch. Adaptive step size ∀n, η(n) = τ(n) min ρ Lτ(n) max ZHOU Jialun Online estimation of ECD and its Mixture 29 / 39

Slide 30

Slide 30 text

ECD Mixture of ECD Simulation MECD Percentage of ’correct’ estimates correct estimates EM-FP 83% SG 77% CIG-DS 90% CIG-OB 89% CIG-AS 90% ZHOU Jialun Online estimation of ECD and its Mixture 30 / 39

Slide 31

Slide 31 text

ECD Mixture of ECD Simulation MECD (a) mixture of MGGD (b) mixture of Student-T Figure 8: Convergence rate11 11DS means Decreasing Stepsize and AS means Adaptive Stepsize ZHOU Jialun Online estimation of ECD and its Mixture 31 / 39

Slide 32

Slide 32 text

ECD Mixture of ECD Comparison of efficiency (a) mixture of MGGD (b) mixture of Student-T Figure 9: Log-likelihood vs iterations ZHOU Jialun Online estimation of ECD and its Mixture 32 / 39

Slide 33

Slide 33 text

ECD Mixture of ECD Comparison of efficiency (a) mixture of MGGD (b) mixture of Student-T Figure 10: Log-likelihood vs time consumption ZHOU Jialun Online estimation of ECD and its Mixture 33 / 39

Slide 34

Slide 34 text

ECD Mixture of ECD Texture segmentation (a) Original Texture (b) EM-FP (c) SG (d) CIG-DS (e) CIG-OB (f) CIG-AS Figure 11: Visual results ZHOU Jialun Online estimation of ECD and its Mixture 34 / 39

Slide 35

Slide 35 text

ECD Mixture of ECD Texture segmentation Accuracy of segmentation correct estimates EM-FP 93.011% SG 63.328% CIG-DS 96.072% CIG-OB 85.338% CIG-AS 92.121% ZHOU Jialun Online estimation of ECD and its Mixture 35 / 39

Slide 36

Slide 36 text

ECD Mixture of ECD ss Thank you for your attention. ZHOU Jialun Online estimation of ECD and its Mixture 36 / 39

Slide 37

Slide 37 text

ECD Mixture of ECD Appendix Generator function The generator function in (3) is specifically given as g (x − µ)†Σ−1(x − µ), β = exp − 1 2 (x − µ)†Σ−1(x − µ) β for MGGD (20) g (x − µ)†Σ−1(x − µ), β = 1 + (x − µ)†Σ−1(x − µ) β − β+m 2 for Student-T (21) ZHOU Jialun Online estimation of ECD and its Mixture 37 / 39

Slide 38

Slide 38 text

ECD Mixture of ECD Appendix Component-wise Information Distance d2(θn , θ∗) = d2(µn , µ∗) + d2(Σn , Σ∗) + d2(βn , β∗) (22) where d2(µn , µ∗) = Iµ µn − µ∗ 2 (23a) d2(Σn , Σ∗) = I1 tr log Σ−1 n Σ∗ 2 + I2 tr2 log Σ−1 n Σ∗ (23b) d2(βn , β∗) = Iβ log2(β−1 n β∗) (23c) This distance is use for measuring the errors in Figure 1, 2 and 4. ZHOU Jialun Online estimation of ECD and its Mixture 38 / 39

Slide 39

Slide 39 text

ECD Mixture of ECD Appendix Definition of O(n−1) ∃K > 0, ∃n0 > 0, such that, d2(θ(n), θ∗) K n (24) Chi-2 distribution If a = 1, (n1 2 θ(i))i∈1,··· ,d converges to a normal distribution with an identity covariance Id , and the estimates are asymptotically efficient. For θ = (µ, Σ) with fixed β∗ nd2 (θ∗, θn ) ⇒ X2 m(m + 1) 2 + m (25) This result is numerically verifed in Figure 3 ZHOU Jialun Online estimation of ECD and its Mixture 39 / 39