S³ Seminar
May 21, 2021
23

# Jialun Zhou

(IMS, Groupe Signal Image — CNRS, Université de Bordeaux)

https://s3-seminar.github.io/seminars/jialun-zhou

Title — Online estimation of elliptical distributions and their mixture: The component-wise information gradient method

Abstract — Elliptically-Contoured Distributions (ECD) and its Mixture model (MECD) are highly versatile at modeling general, real-world probability distributions. They have therefore played a valuable role in computer vision, image processing, radar signal processing, and biomedical signal processing. Maximum likelihood estimation (MLE) of ECD leads to a system of non-linear equations, most-often addressed using Fixed-Point (FP) methods. And MECD is usually estimated under Expectation-Maximization (EM) framework with FP method. Unfortunately, these methods can become impractical, for large-scale or high-dimensional datasets (due to lack of time, memory, or computational resources). To overcome this difficulty, we introduce a Riemannian optimization method, the Component-wise Information Gradient. On the one hand, CIG is an online method, so its recursive nature greatly reduces time and memory consumption. On the other hand, it uses information geometry to correctly calibrate gradient descent step-sizes, leading to an improved rate of convergence. We also mathematically formulate this rate of convergence, for two variants of CIG, decreasing or adaptive step-sizes, respectively. It also shows that the CIG method compares advantageously to the state-of-the-art, both in computer experiments, and in practical applications.

May 21, 2021

## Transcript

1. ### Online estimation of ECD and its Mixture Component-wise Information Gradient

method ZHOU Jialun 1 1IMS laboratory, University of Bordeaux May 19, 2021
2. None
3. ### Estimation problem ˆ θ = arg min θ D(θ) (1)

ˆ θk+1 = F(ˆ θk , ∇θ D(θ)) (2) Iterative algorithm 1 Ofﬂine version, e.g. Expectation Maximisation 1. 2 Online version, e.g. Stochastic Gradient Descent 2. Difﬁculties Ofﬂine methods → Memory and time Classic stochastic approximation → Non-convexity, instability and step size 1dempster1977maximum 2saad1998online
4. ### Optimization on Riemannian Manifolds Absil. ”Optimization algorithms on matrix manifolds”.

2009. Bonnabel Silvere. ”Stochastic gradient descent on Riemannian manifolds”. 2013. Amari Shun-Ichi. ”Natural gradient works efﬁciently in learning”. 1998. Component-wise Information Gradient Save memory and time Easy to select step size Improve performance
5. ### Contents 1. ECD Introduction and problematic Riemannian Geometry Optimization on

Riemannian manifold Component-wise Information Metric and gradient Retraction map Algorithms Global convergence CIG for ECD 2. Mixture of ECD ZHOU Jialun Online estimation of ECD and its Mixture 5 / 39
6. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

ECD Density of ECD p(x|θ) = c(β) |Σ|−1/2 gβ (x − µ)†Σ−1(x − µ) (3) where gβ is the generator function . ZHOU Jialun Online estimation of ECD and its Mixture 6 / 39
7. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Parametric space Parametric space µ ∈ Rm, Σ ∈ Pm , β ∈ R+. Cost function D(θ) = E [q(θ; x)] where q = − log p(θ; x) (4) ZHOU Jialun Online estimation of ECD and its Mixture 7 / 39
8. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

General update formula General update formula on Riemannian manifold θn+1 = Retθn (γn+1 u(θn , xn+1 )) n = 0, 1, · · · (5) ZHOU Jialun Online estimation of ECD and its Mixture 8 / 39
9. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Cimponent-wise Information metric Difﬁculty Information metric does not have closed form 3 Solution: Component-wise Information Metric 3verdoolaege2012geometry ZHOU Jialun Online estimation of ECD and its Mixture 9 / 39
10. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Information metric for Σ Information metric for Σ The analytical expression of ·, · Σ 4. UΣ , VΣ Σ = IΣ,1 tr Σ−1 UΣ Σ−1 VΣ + IΣ,2 tr Σ−1 UΣ tr Σ−1 VΣ (6) 4berkane1997geodesic ZHOU Jialun Online estimation of ECD and its Mixture 10 / 39
11. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Information metric for µ and β Information metric for µ The analytical expression of ·, · µ 5. Uµ , Vµ µ = Iµ U† µ Σ−1 Vµ (7) Information metric for β Uβ , Vβ β = Iβ Uβ Vβ (8) 5zhou2020riemannian ZHOU Jialun Online estimation of ECD and its Mixture 11 / 39
12. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Component-wise Information Gradient Component-wise Information Gradient The solution of the following equation is the component-wise information gradient ∇θ q(θ; x), v θ = dq(θ; x)[v] (9) And its abstract expression is given as ∇θ q(θ; x) =   ∇µ q(θ; x) ∇Σ q(θ; x) ∇β q(θ; x)   (10) ZHOU Jialun Online estimation of ECD and its Mixture 12 / 39
13. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Retraction on parametric space Retraction map Retθ(n) η(n)∇θ q(θ(n); X) =    Exp µ (n) (η(n)∇µ q(θ(n); X)) Exp Σ (n) (η(n)∇Σ q(θ(n); X)) Exp β (n) (η(n)∇β q(θ(n); X))    (11) ZHOU Jialun Online estimation of ECD and its Mixture 13 / 39
14. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Global convergence Geodesically α-Strongly convexity A function D : Θ → R is said to be geodesically α-strongly convex if for any θ1 , θ2 ∈ Θ, D(θ2 ) D(θ1 ) + grad θ1 D(θ1 ), Exp−1 θ1 (θ2 ) θ1 + α 2 d2(θ1 , θ2 ) (12) The following cases, the global convergence is guaranteed, i.e. ∀θ0 ∈ Θ, we always have lim θn = θ∗. Student-T6 θ = (Σ), for β∗ > −m θ = (µ, Σ), for β∗ > 0 MGGD7 θ = (Σ), for β∗ > 0 θ = (µ, Σ), for β∗ > 0 6laus2019multivariate 7zhang2013multivariate ZHOU Jialun Online estimation of ECD and its Mixture 14 / 39
15. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

CIG with deterministic step-size Reformulated objective function ˆ D(θ) = 1 T T t=1 q(θ; x(t)) (13) CIG with deterministic step-size 8 1 Update µ: Calculate ∇µ ˆ D(θ) according to (µ(n), Σ(n), β(n)); Select η(n) µ according to Armijo-Goldstein criteria and update µ(n+1); 2 Update Σ: Calculate ∇Σ ˆ D(θ) according to (µ(n+1), Σ(n), β(n)); Select η(n) Σ according to Armijo-Goldstein criteria and update Σ(n+1); 3 Update β: Calculate ∇β ˆ D(θ) according to (µ(n+1), Σ(n+1), β(n)); Select η(n) β according to Armijo-Goldstein criteria and update β(n+1); 8zhou2020riemannian ZHOU Jialun Online estimation of ECD and its Mixture 15 / 39
16. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

CIG with deterministic step size (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ Figure 1: The convergence of CIG with deterministic step-size for ECDThe step size is selected according to the Armijo-Goldstein criteria The error is measured by the component-wise information distance (22). ZHOU Jialun Online estimation of ECD and its Mixture 16 / 39
17. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

CIG with decreasing step size CIG with decreasing step size9 1 η(n+1) ← a n+1 with a > 0; 2 Update µ: Calculate ∇µ q(θ; X) according to (µ(n), Σ(n), β(n)); Update µ(n+1) with η(n+1); 3 Update Σ: Calculate ∇Σ q(θ; X) according to (µ(n+1), Σ(n), β(n)); Update Σ(n+1) with η(n+1); 4 Update β: Calculate ∇β q(θ; X) according to (µ(n+1), Σ(n+1), β(n)); Update β(n+1) with η(n+1); 9zhou2020riemannian ZHOU Jialun Online estimation of ECD and its Mixture 17 / 39
18. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Theorical results Theorical results For general case θ = (µ, Σ, β) 1 Convergence. 2 Mean-square rate, ∀a > 1 2λ , E d2(θn , θ∗) = O(n−1). 3 Asymptotic normality. For θ = (Σ) and θ = (µ, Σ), CIM = IM 1 ∀a > 1 2 , the mean square rate holds. 2 If a = 1, the estimates θ(n) are asymptotically efﬁcient. ZHOU Jialun Online estimation of ECD and its Mixture 18 / 39
19. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Convergence rate for decreasing step size (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β) Figure 2: Convergence rate, with a > 1 2λ Recall the mean square rate: ∃K > 0, ∃n0 > 0, such that, d2(θ(n), θ∗) K n (14) ZHOU Jialun Online estimation of ECD and its Mixture 19 / 39
20. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Asymptotic normality (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ Figure 3: Asymptotic normality The formal explication is in (25). ZHOU Jialun Online estimation of ECD and its Mixture 20 / 39
21. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Efﬁciency (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β) (d) summary Figure 4: Efﬁciency ZHOU Jialun Online estimation of ECD and its Mixture 21 / 39
22. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Time consumption (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β) (d) summary Figure 5: Time consumption ZHOU Jialun Online estimation of ECD and its Mixture 22 / 39
23. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Color transformation Input Target MM FP CIG Ofﬂine CIG Online Figure 6: Full HD image with 5D transformation10 10Hristova, Hristina, ”Transformation of the multivariate generalized Gaussian distribution for image editing”. 2017. ZHOU Jialun Online estimation of ECD and its Mixture 23 / 39
24. ### ECD Mixture of ECD Introduction and problematic Riemannian Geometry Algorithms

Color transformation MM FP CIG Ofﬂine CIG Online Figure 7: Details of transformation ZHOU Jialun Online estimation of ECD and its Mixture 24 / 39
25. ### ECD Mixture of ECD Mixture of ECD Density function f(x;

θ) = K k=1 wk p(x; µk , Σk , βk ) with K k=1 wk = 1 (15) Parameter and Parametric space θ =     [w1 , · · · , wK ] ,     µ(1) 1 · · · µ(1) K . . . ... . . . µ(m) 1 · · · µ(m) K     ,     Σ(1) 1 · · · Σ(1) K . . . ... . . . Σ(d) 1 · · · Σ(d) K     , [β1 , · · · , βK ]     (16) Θ = (0, 1)K × Rm×K × PK m × RK + (17) ZHOU Jialun Online estimation of ECD and its Mixture 25 / 39
26. ### ECD Mixture of ECD Unit sphere Geometry structures on unit

sphere Cost function D(θ) = E q(θ; x) = −E log f(θ; x) (18) ZHOU Jialun Online estimation of ECD and its Mixture 26 / 39
27. ### ECD Mixture of ECD CIM for MECD CIM for MECD

uθ , vθ θ = ur , vr + K k=1 uµk , vµk µk + uΣk , vΣk Σk + uβk , vβk βk (19) ZHOU Jialun Online estimation of ECD and its Mixture 27 / 39
28. ### ECD Mixture of ECD CIG with decreasing step-size CIG with

decreasing step-size 1 η(n+1) ← a n+1 ; 2 Update weights Update r(n+1) with η(n+1); 3 Update (µk , Σk , βk )k : Update µ(n+1) k according to (r(n+1), µ(n) k , Σ(n) k , β(n) k ) Update Σ(n+1) k according to (r(n+1), µ(n+1) k , Σ(n) k , β(n) k ) Update β(n+1) k according to (r(n+1), µ(n+1) k , Σ(n+1) k , β(n) k ) ZHOU Jialun Online estimation of ECD and its Mixture 28 / 39
29. ### ECD Mixture of ECD Variant step size Convergence rate If

∀a > 1 2λ , the mean square rate holds. Online backtracking step size ∀n, η(n) is selected according to a mini-batch. Adaptive step size ∀n, η(n) = τ(n) min ρ Lτ(n) max ZHOU Jialun Online estimation of ECD and its Mixture 29 / 39
30. ### ECD Mixture of ECD Simulation MECD Percentage of ’correct’ estimates

correct estimates EM-FP 83% SG 77% CIG-DS 90% CIG-OB 89% CIG-AS 90% ZHOU Jialun Online estimation of ECD and its Mixture 30 / 39
31. ### ECD Mixture of ECD Simulation MECD (a) mixture of MGGD

(b) mixture of Student-T Figure 8: Convergence rate11 11DS means Decreasing Stepsize and AS means Adaptive Stepsize ZHOU Jialun Online estimation of ECD and its Mixture 31 / 39
32. ### ECD Mixture of ECD Comparison of efﬁciency (a) mixture of

MGGD (b) mixture of Student-T Figure 9: Log-likelihood vs iterations ZHOU Jialun Online estimation of ECD and its Mixture 32 / 39
33. ### ECD Mixture of ECD Comparison of efﬁciency (a) mixture of

MGGD (b) mixture of Student-T Figure 10: Log-likelihood vs time consumption ZHOU Jialun Online estimation of ECD and its Mixture 33 / 39
34. ### ECD Mixture of ECD Texture segmentation (a) Original Texture (b)

EM-FP (c) SG (d) CIG-DS (e) CIG-OB (f) CIG-AS Figure 11: Visual results ZHOU Jialun Online estimation of ECD and its Mixture 34 / 39
35. ### ECD Mixture of ECD Texture segmentation Accuracy of segmentation correct

estimates EM-FP 93.011% SG 63.328% CIG-DS 96.072% CIG-OB 85.338% CIG-AS 92.121% ZHOU Jialun Online estimation of ECD and its Mixture 35 / 39
36. ### ECD Mixture of ECD ss Thank you for your attention.

ZHOU Jialun Online estimation of ECD and its Mixture 36 / 39
37. ### ECD Mixture of ECD Appendix Generator function The generator function

in (3) is speciﬁcally given as g (x − µ)†Σ−1(x − µ), β = exp − 1 2 (x − µ)†Σ−1(x − µ) β for MGGD (20) g (x − µ)†Σ−1(x − µ), β = 1 + (x − µ)†Σ−1(x − µ) β − β+m 2 for Student-T (21) ZHOU Jialun Online estimation of ECD and its Mixture 37 / 39
38. ### ECD Mixture of ECD Appendix Component-wise Information Distance d2(θn ,

θ∗) = d2(µn , µ∗) + d2(Σn , Σ∗) + d2(βn , β∗) (22) where d2(µn , µ∗) = Iµ µn − µ∗ 2 (23a) d2(Σn , Σ∗) = I1 tr log Σ−1 n Σ∗ 2 + I2 tr2 log Σ−1 n Σ∗ (23b) d2(βn , β∗) = Iβ log2(β−1 n β∗) (23c) This distance is use for measuring the errors in Figure 1, 2 and 4. ZHOU Jialun Online estimation of ECD and its Mixture 38 / 39
39. ### ECD Mixture of ECD Appendix Deﬁnition of O(n−1) ∃K >

0, ∃n0 > 0, such that, d2(θ(n), θ∗) K n (24) Chi-2 distribution If a = 1, (n1 2 θ(i))i∈1,··· ,d converges to a normal distribution with an identity covariance Id , and the estimates are asymptotically efﬁcient. For θ = (µ, Σ) with ﬁxed β∗ nd2 (θ∗, θn ) ⇒ X2 m(m + 1) 2 + m (25) This result is numerically verifed in Figure 3 ZHOU Jialun Online estimation of ECD and its Mixture 39 / 39