Jialun Zhou

Online estimation of ECD and its Mixture Component-wise Information Gradient
method ZHOU Jialun 1 1IMS laboratory, University of Bordeaux May 19, 2021

Estimation problem ˆ θ = arg min θ D(θ) (1)
ˆ θk+1 = F(ˆ θk , ∇θ D(θ)) (2) Iterative algorithm 1 Offline version, e.g. Expectation Maximisation 1. 2 Online version, e.g. Stochastic Gradient Descent 2. Difficulties Offline methods → Memory and time Classic stochastic approximation → Non-convexity, instability and step size 1dempster1977maximum 2saad1998online

Optimization on Riemannian Manifolds Absil. ”Optimization algorithms on matrix manifolds”.
2009. Bonnabel Silvere. ”Stochastic gradient descent on Riemannian manifolds”. 2013. Amari Shun-Ichi. ”Natural gradient works efﬁciently in learning”. 1998. Component-wise Information Gradient Save memory and time Easy to select step size Improve performance

Contents 1. ECD Introduction and problematic Riemannian Geometry Optimization on
Riemannian manifold Component-wise Information Metric and gradient Retraction map Algorithms Global convergence CIG for ECD 2. Mixture of ECD ZHOU Jialun Online estimation of ECD and its Mixture 5 / 39

ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms
ECD Density of ECD p(x|θ) = c(β) |Σ|−1/2 gβ (x − µ)†Σ−1(x − µ) (3) where gβ is the generator function . ZHOU Jialun Online estimation of ECD and its Mixture 6 / 39

Parametric space Parametric space µ ∈ Rm, Σ ∈ Pm , β ∈ R+. Cost function D(θ) = E [q(θ; x)] where q = − log p(θ; x) (4) ZHOU Jialun Online estimation of ECD and its Mixture 7 / 39

General update formula General update formula on Riemannian manifold θn+1 = Retθn (γn+1 u(θn , xn+1 )) n = 0, 1, · · · (5) ZHOU Jialun Online estimation of ECD and its Mixture 8 / 39

Cimponent-wise Information metric Difﬁculty Information metric does not have closed form 3 Solution: Component-wise Information Metric 3verdoolaege2012geometry ZHOU Jialun Online estimation of ECD and its Mixture 9 / 39

Information metric for Σ Information metric for Σ The analytical expression of ·, · Σ 4. UΣ , VΣ Σ = IΣ,1 tr Σ−1 UΣ Σ−1 VΣ + IΣ,2 tr Σ−1 UΣ tr Σ−1 VΣ (6) 4berkane1997geodesic ZHOU Jialun Online estimation of ECD and its Mixture 10 / 39

Information metric for µ and β Information metric for µ The analytical expression of ·, · µ 5. Uµ , Vµ µ = Iµ U† µ Σ−1 Vµ (7) Information metric for β Uβ , Vβ β = Iβ Uβ Vβ (8) 5zhou2020riemannian ZHOU Jialun Online estimation of ECD and its Mixture 11 / 39

Component-wise Information Gradient Component-wise Information Gradient The solution of the following equation is the component-wise information gradient ∇θ q(θ; x), v θ = dq(θ; x)[v] (9) And its abstract expression is given as ∇θ q(θ; x) =   ∇µ q(θ; x) ∇Σ q(θ; x) ∇β q(θ; x)   (10) ZHOU Jialun Online estimation of ECD and its Mixture 12 / 39

Retraction on parametric space Retraction map Retθ(n) η(n)∇θ q(θ(n); X) =    Exp µ (n) (η(n)∇µ q(θ(n); X)) Exp Σ (n) (η(n)∇Σ q(θ(n); X)) Exp β (n) (η(n)∇β q(θ(n); X))    (11) ZHOU Jialun Online estimation of ECD and its Mixture 13 / 39

Global convergence Geodesically α-Strongly convexity A function D : Θ → R is said to be geodesically α-strongly convex if for any θ1 , θ2 ∈ Θ, D(θ2 ) D(θ1 ) + grad θ1 D(θ1 ), Exp−1 θ1 (θ2 ) θ1 + α 2 d2(θ1 , θ2 ) (12) The following cases, the global convergence is guaranteed, i.e. ∀θ0 ∈ Θ, we always have lim θn = θ∗. Student-T6 θ = (Σ), for β∗ > −m θ = (µ, Σ), for β∗ > 0 MGGD7 θ = (Σ), for β∗ > 0 θ = (µ, Σ), for β∗ > 0 6laus2019multivariate 7zhang2013multivariate ZHOU Jialun Online estimation of ECD and its Mixture 14 / 39

CIG with deterministic step-size Reformulated objective function ˆ D(θ) = 1 T T t=1 q(θ; x(t)) (13) CIG with deterministic step-size 8 1 Update µ: Calculate ∇µ ˆ D(θ) according to (µ(n), Σ(n), β(n)); Select η(n) µ according to Armijo-Goldstein criteria and update µ(n+1); 2 Update Σ: Calculate ∇Σ ˆ D(θ) according to (µ(n+1), Σ(n), β(n)); Select η(n) Σ according to Armijo-Goldstein criteria and update Σ(n+1); 3 Update β: Calculate ∇β ˆ D(θ) according to (µ(n+1), Σ(n+1), β(n)); Select η(n) β according to Armijo-Goldstein criteria and update β(n+1); 8zhou2020riemannian ZHOU Jialun Online estimation of ECD and its Mixture 15 / 39

CIG with deterministic step size (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ Figure 1: The convergence of CIG with deterministic step-size for ECDThe step size is selected according to the Armijo-Goldstein criteria The error is measured by the component-wise information distance (22). ZHOU Jialun Online estimation of ECD and its Mixture 16 / 39

CIG with decreasing step size CIG with decreasing step size9 1 η(n+1) ← a n+1 with a > 0; 2 Update µ: Calculate ∇µ q(θ; X) according to (µ(n), Σ(n), β(n)); Update µ(n+1) with η(n+1); 3 Update Σ: Calculate ∇Σ q(θ; X) according to (µ(n+1), Σ(n), β(n)); Update Σ(n+1) with η(n+1); 4 Update β: Calculate ∇β q(θ; X) according to (µ(n+1), Σ(n+1), β(n)); Update β(n+1) with η(n+1); 9zhou2020riemannian ZHOU Jialun Online estimation of ECD and its Mixture 17 / 39

Theorical results Theorical results For general case θ = (µ, Σ, β) 1 Convergence. 2 Mean-square rate, ∀a > 1 2λ , E d2(θn , θ∗) = O(n−1). 3 Asymptotic normality. For θ = (Σ) and θ = (µ, Σ), CIM = IM 1 ∀a > 1 2 , the mean square rate holds. 2 If a = 1, the estimates θ(n) are asymptotically efﬁcient. ZHOU Jialun Online estimation of ECD and its Mixture 18 / 39

Convergence rate for decreasing step size (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β) Figure 2: Convergence rate, with a > 1 2λ Recall the mean square rate: ∃K > 0, ∃n0 > 0, such that, d2(θ(n), θ∗) K n (14) ZHOU Jialun Online estimation of ECD and its Mixture 19 / 39

Asymptotic normality (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ Figure 3: Asymptotic normality The formal explication is in (25). ZHOU Jialun Online estimation of ECD and its Mixture 20 / 39

Efﬁciency (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β) (d) summary Figure 4: Efﬁciency ZHOU Jialun Online estimation of ECD and its Mixture 21 / 39

Time consumption (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β) (d) summary Figure 5: Time consumption ZHOU Jialun Online estimation of ECD and its Mixture 22 / 39

Color transformation Input Target MM FP CIG Ofﬂine CIG Online Figure 6: Full HD image with 5D transformation10 10Hristova, Hristina, ”Transformation of the multivariate generalized Gaussian distribution for image editing”. 2017. ZHOU Jialun Online estimation of ECD and its Mixture 23 / 39

Color transformation MM FP CIG Ofﬂine CIG Online Figure 7: Details of transformation ZHOU Jialun Online estimation of ECD and its Mixture 24 / 39

ECD Mixture of ECD Mixture of ECD Density function f(x;
θ) = K k=1 wk p(x; µk , Σk , βk ) with K k=1 wk = 1 (15) Parameter and Parametric space θ =     [w1 , · · · , wK ] ,     µ(1) 1 · · · µ(1) K . . . ... . . . µ(m) 1 · · · µ(m) K     ,     Σ(1) 1 · · · Σ(1) K . . . ... . . . Σ(d) 1 · · · Σ(d) K     , [β1 , · · · , βK ]     (16) Θ = (0, 1)K × Rm×K × PK m × RK + (17) ZHOU Jialun Online estimation of ECD and its Mixture 25 / 39

ECD Mixture of ECD Unit sphere Geometry structures on unit
sphere Cost function D(θ) = E q(θ; x) = −E log f(θ; x) (18) ZHOU Jialun Online estimation of ECD and its Mixture 26 / 39

ECD Mixture of ECD CIM for MECD CIM for MECD
uθ , vθ θ = ur , vr + K k=1 uµk , vµk µk + uΣk , vΣk Σk + uβk , vβk βk (19) ZHOU Jialun Online estimation of ECD and its Mixture 27 / 39

ECD Mixture of ECD CIG with decreasing step-size CIG with
decreasing step-size 1 η(n+1) ← a n+1 ; 2 Update weights Update r(n+1) with η(n+1); 3 Update (µk , Σk , βk )k : Update µ(n+1) k according to (r(n+1), µ(n) k , Σ(n) k , β(n) k ) Update Σ(n+1) k according to (r(n+1), µ(n+1) k , Σ(n) k , β(n) k ) Update β(n+1) k according to (r(n+1), µ(n+1) k , Σ(n+1) k , β(n) k ) ZHOU Jialun Online estimation of ECD and its Mixture 28 / 39

ECD Mixture of ECD Variant step size Convergence rate If
∀a > 1 2λ , the mean square rate holds. Online backtracking step size ∀n, η(n) is selected according to a mini-batch. Adaptive step size ∀n, η(n) = τ(n) min ρ Lτ(n) max ZHOU Jialun Online estimation of ECD and its Mixture 29 / 39

ECD Mixture of ECD Simulation MECD Percentage of ’correct’ estimates
correct estimates EM-FP 83% SG 77% CIG-DS 90% CIG-OB 89% CIG-AS 90% ZHOU Jialun Online estimation of ECD and its Mixture 30 / 39

ECD Mixture of ECD Simulation MECD (a) mixture of MGGD
(b) mixture of Student-T Figure 8: Convergence rate11 11DS means Decreasing Stepsize and AS means Adaptive Stepsize ZHOU Jialun Online estimation of ECD and its Mixture 31 / 39

ECD Mixture of ECD Comparison of efﬁciency (a) mixture of
MGGD (b) mixture of Student-T Figure 9: Log-likelihood vs iterations ZHOU Jialun Online estimation of ECD and its Mixture 32 / 39

ECD Mixture of ECD Comparison of efﬁciency (a) mixture of
MGGD (b) mixture of Student-T Figure 10: Log-likelihood vs time consumption ZHOU Jialun Online estimation of ECD and its Mixture 33 / 39

ECD Mixture of ECD Texture segmentation (a) Original Texture (b)
EM-FP (c) SG (d) CIG-DS (e) CIG-OB (f) CIG-AS Figure 11: Visual results ZHOU Jialun Online estimation of ECD and its Mixture 34 / 39

ECD Mixture of ECD Texture segmentation Accuracy of segmentation correct
estimates EM-FP 93.011% SG 63.328% CIG-DS 96.072% CIG-OB 85.338% CIG-AS 92.121% ZHOU Jialun Online estimation of ECD and its Mixture 35 / 39

ECD Mixture of ECD ss Thank you for your attention.
ZHOU Jialun Online estimation of ECD and its Mixture 36 / 39

ECD Mixture of ECD Appendix Generator function The generator function
in (3) is speciﬁcally given as g (x − µ)†Σ−1(x − µ), β = exp − 1 2 (x − µ)†Σ−1(x − µ) β for MGGD (20) g (x − µ)†Σ−1(x − µ), β = 1 + (x − µ)†Σ−1(x − µ) β − β+m 2 for Student-T (21) ZHOU Jialun Online estimation of ECD and its Mixture 37 / 39

ECD Mixture of ECD Appendix Component-wise Information Distance d2(θn ,
θ∗) = d2(µn , µ∗) + d2(Σn , Σ∗) + d2(βn , β∗) (22) where d2(µn , µ∗) = Iµ µn − µ∗ 2 (23a) d2(Σn , Σ∗) = I1 tr log Σ−1 n Σ∗ 2 + I2 tr2 log Σ−1 n Σ∗ (23b) d2(βn , β∗) = Iβ log2(β−1 n β∗) (23c) This distance is use for measuring the errors in Figure 1, 2 and 4. ZHOU Jialun Online estimation of ECD and its Mixture 38 / 39

ECD Mixture of ECD Appendix Definition of O(n−1) ∃K >
0, ∃n0 > 0, such that, d2(θ(n), θ∗) K n (24) Chi-2 distribution If a = 1, (n1 2 θ(i))i∈1,··· ,d converges to a normal distribution with an identity covariance Id , and the estimates are asymptotically efficient. For θ = (µ, Σ) with fixed β∗ nd2 (θ∗, θn ) ⇒ X2 m(m + 1) 2 + m (25) This result is numerically verifed in Figure 3 ZHOU Jialun Online estimation of ECD and its Mixture 39 / 39

Jialun Zhou

Jialun Zhou

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript