Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jialun Zhou

Jialun Zhou

(IMS, Groupe Signal Image — CNRS, Université de Bordeaux)

https://s3-seminar.github.io/seminars/jialun-zhou

Title — Online estimation of elliptical distributions and their mixture: The component-wise information gradient method

Abstract — Elliptically-Contoured Distributions (ECD) and its Mixture model (MECD) are highly versatile at modeling general, real-world probability distributions. They have therefore played a valuable role in computer vision, image processing, radar signal processing, and biomedical signal processing. Maximum likelihood estimation (MLE) of ECD leads to a system of non-linear equations, most-often addressed using Fixed-Point (FP) methods. And MECD is usually estimated under Expectation-Maximization (EM) framework with FP method. Unfortunately, these methods can become impractical, for large-scale or high-dimensional datasets (due to lack of time, memory, or computational resources). To overcome this difficulty, we introduce a Riemannian optimization method, the Component-wise Information Gradient. On the one hand, CIG is an online method, so its recursive nature greatly reduces time and memory consumption. On the other hand, it uses information geometry to correctly calibrate gradient descent step-sizes, leading to an improved rate of convergence. We also mathematically formulate this rate of convergence, for two variants of CIG, decreasing or adaptive step-sizes, respectively. It also shows that the CIG method compares advantageously to the state-of-the-art, both in computer experiments, and in practical applications.

S³ Seminar

May 21, 2021
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Online estimation of ECD and its Mixture
    Component-wise Information Gradient method
    ZHOU Jialun 1
    1IMS laboratory, University of Bordeaux
    May 19, 2021

    View Slide

  2. View Slide

  3. Estimation problem
    ˆ
    θ = arg min
    θ
    D(θ) (1)
    ˆ
    θk+1
    = F(ˆ
    θk
    , ∇θ
    D(θ)) (2)
    Iterative algorithm
    1 Offline version, e.g. Expectation Maximisation 1.
    2 Online version, e.g. Stochastic Gradient Descent 2.
    Difficulties
    Offline methods → Memory and time
    Classic stochastic approximation → Non-convexity, instability and step
    size
    1dempster1977maximum
    2saad1998online

    View Slide

  4. Optimization on Riemannian Manifolds
    Absil. ”Optimization algorithms on matrix manifolds”. 2009.
    Bonnabel Silvere. ”Stochastic gradient descent on Riemannian
    manifolds”. 2013.
    Amari Shun-Ichi. ”Natural gradient works efficiently in learning”. 1998.
    Component-wise Information Gradient
    Save memory and time
    Easy to select step size
    Improve performance

    View Slide

  5. Contents
    1. ECD
    Introduction and problematic
    Riemannian Geometry
    Optimization on Riemannian manifold
    Component-wise Information Metric and gradient
    Retraction map
    Algorithms
    Global convergence
    CIG for ECD
    2. Mixture of ECD
    ZHOU Jialun Online estimation of ECD and its Mixture 5 / 39

    View Slide

  6. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    ECD
    Density of ECD
    p(x|θ) = c(β) |Σ|−1/2 gβ
    (x − µ)†Σ−1(x − µ) (3)
    where gβ
    is the generator function .
    ZHOU Jialun Online estimation of ECD and its Mixture 6 / 39

    View Slide

  7. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Parametric space
    Parametric space
    µ ∈ Rm, Σ ∈ Pm
    , β ∈ R+.
    Cost function
    D(θ) = E [q(θ; x)] where q = − log p(θ; x) (4)
    ZHOU Jialun Online estimation of ECD and its Mixture 7 / 39

    View Slide

  8. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    General update formula
    General update formula on Riemannian manifold
    θn+1
    = Retθn
    (γn+1
    u(θn
    , xn+1
    )) n = 0, 1, · · · (5)
    ZHOU Jialun Online estimation of ECD and its Mixture 8 / 39

    View Slide

  9. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Cimponent-wise Information metric
    Difficulty
    Information metric does not have closed form 3
    Solution: Component-wise Information Metric
    3verdoolaege2012geometry
    ZHOU Jialun Online estimation of ECD and its Mixture 9 / 39

    View Slide

  10. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Information metric for Σ
    Information metric for Σ
    The analytical expression of ·, ·
    Σ
    4.

    , VΣ Σ
    = IΣ,1
    tr Σ−1 UΣ
    Σ−1 VΣ
    + IΣ,2
    tr Σ−1 UΣ
    tr Σ−1 VΣ
    (6)
    4berkane1997geodesic
    ZHOU Jialun Online estimation of ECD and its Mixture 10 / 39

    View Slide

  11. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Information metric for µ and β
    Information metric for µ
    The analytical expression of ·, ·
    µ
    5.

    , Vµ µ
    = Iµ
    U†
    µ
    Σ−1 Vµ
    (7)
    Information metric for β

    , Vβ β
    = Iβ


    (8)
    5zhou2020riemannian
    ZHOU Jialun Online estimation of ECD and its Mixture 11 / 39

    View Slide

  12. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Component-wise Information Gradient
    Component-wise Information Gradient
    The solution of the following equation is the component-wise information
    gradient
    ∇θ
    q(θ; x), v
    θ
    = dq(θ; x)[v] (9)
    And its abstract expression is given as
    ∇θ
    q(θ; x) =


    ∇µ
    q(θ; x)
    ∇Σ
    q(θ; x)
    ∇β
    q(θ; x)

     (10)
    ZHOU Jialun Online estimation of ECD and its Mixture 12 / 39

    View Slide

  13. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Retraction on parametric space
    Retraction map
    Retθ(n)
    η(n)∇θ
    q(θ(n); X) =



    Exp
    µ
    (n)
    (η(n)∇µ
    q(θ(n); X))
    Exp
    Σ
    (n)
    (η(n)∇Σ
    q(θ(n); X))
    Exp
    β
    (n)
    (η(n)∇β
    q(θ(n); X))



    (11)
    ZHOU Jialun Online estimation of ECD and its Mixture 13 / 39

    View Slide

  14. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Global convergence
    Geodesically α-Strongly convexity
    A function D : Θ → R is said to be geodesically α-strongly convex if for any
    θ1
    , θ2
    ∈ Θ,
    D(θ2
    ) D(θ1
    ) + grad
    θ1
    D(θ1
    ), Exp−1
    θ1
    (θ2
    )
    θ1
    +
    α
    2
    d2(θ1
    , θ2
    ) (12)
    The following cases, the global convergence is guaranteed, i.e. ∀θ0
    ∈ Θ, we
    always have lim θn = θ∗.
    Student-T6
    θ = (Σ), for β∗ > −m
    θ = (µ, Σ), for β∗ > 0
    MGGD7
    θ = (Σ), for β∗ > 0
    θ = (µ, Σ), for β∗ > 0
    6laus2019multivariate
    7zhang2013multivariate
    ZHOU Jialun Online estimation of ECD and its Mixture 14 / 39

    View Slide

  15. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    CIG with deterministic step-size
    Reformulated objective function
    ˆ
    D(θ) =
    1
    T
    T
    t=1
    q(θ; x(t)) (13)
    CIG with deterministic step-size 8
    1 Update µ:
    Calculate ∇µ
    ˆ
    D(θ) according to (µ(n), Σ(n), β(n));
    Select η(n)
    µ
    according to Armijo-Goldstein criteria and update µ(n+1);
    2 Update Σ:
    Calculate ∇Σ
    ˆ
    D(θ) according to (µ(n+1), Σ(n), β(n));
    Select η(n)
    Σ
    according to Armijo-Goldstein criteria and update Σ(n+1);
    3 Update β:
    Calculate ∇β
    ˆ
    D(θ) according to (µ(n+1), Σ(n+1), β(n));
    Select η(n)
    β
    according to Armijo-Goldstein criteria and update β(n+1);
    8zhou2020riemannian
    ZHOU Jialun Online estimation of ECD and its Mixture 15 / 39

    View Slide

  16. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    CIG with deterministic step size
    (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗
    Figure 1: The convergence of CIG with deterministic step-size for ECDThe step size is selected according to the Armijo-Goldstein criteria
    The error is measured by the component-wise information distance (22).
    ZHOU Jialun Online estimation of ECD and its Mixture 16 / 39

    View Slide

  17. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    CIG with decreasing step size
    CIG with decreasing step size9
    1 η(n+1) ← a
    n+1
    with a > 0;
    2 Update µ:
    Calculate ∇µ
    q(θ; X) according to (µ(n), Σ(n), β(n));
    Update µ(n+1) with η(n+1);
    3 Update Σ:
    Calculate ∇Σ
    q(θ; X) according to (µ(n+1), Σ(n), β(n));
    Update Σ(n+1) with η(n+1);
    4 Update β:
    Calculate ∇β
    q(θ; X) according to (µ(n+1), Σ(n+1), β(n));
    Update β(n+1) with η(n+1);
    9zhou2020riemannian
    ZHOU Jialun Online estimation of ECD and its Mixture 17 / 39

    View Slide

  18. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Theorical results
    Theorical results
    For general case θ = (µ, Σ, β)
    1 Convergence.
    2 Mean-square rate, ∀a > 1

    , E d2(θn
    , θ∗) = O(n−1).
    3 Asymptotic normality.
    For θ = (Σ) and θ = (µ, Σ), CIM = IM
    1 ∀a > 1
    2
    , the mean square rate holds.
    2 If a = 1, the estimates θ(n) are asymptotically efficient.
    ZHOU Jialun Online estimation of ECD and its Mixture 18 / 39

    View Slide

  19. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Convergence rate for decreasing step size
    (a) the case θ = (Σ) with
    (µ∗, β∗)
    (b) the case θ = (µ, Σ) with β∗ (c) the case θ = (µ, Σ, β)
    Figure 2: Convergence rate, with a > 1

    Recall the mean square rate:
    ∃K > 0, ∃n0
    > 0, such that, d2(θ(n), θ∗)
    K
    n
    (14)
    ZHOU Jialun Online estimation of ECD and its Mixture 19 / 39

    View Slide

  20. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Asymptotic normality
    (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗
    Figure 3: Asymptotic normality
    The formal explication is in (25).
    ZHOU Jialun Online estimation of ECD and its Mixture 20 / 39

    View Slide

  21. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Efficiency
    (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗
    (c) the case θ = (µ, Σ, β) (d) summary
    Figure 4: Efficiency
    ZHOU Jialun Online estimation of ECD and its Mixture 21 / 39

    View Slide

  22. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Time consumption
    (a) the case θ = (Σ) with (µ∗, β∗) (b) the case θ = (µ, Σ) with β∗
    (c) the case θ = (µ, Σ, β) (d) summary
    Figure 5: Time consumption
    ZHOU Jialun Online estimation of ECD and its Mixture 22 / 39

    View Slide

  23. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Color transformation
    Input Target MM
    FP CIG Offline CIG Online
    Figure 6: Full HD image with 5D transformation10
    10Hristova, Hristina, ”Transformation of the multivariate generalized Gaussian distribution for image editing”. 2017.
    ZHOU Jialun Online estimation of ECD and its Mixture 23 / 39

    View Slide

  24. ECD
    Mixture of ECD
    Introduction and problematic
    Riemannian Geometry
    Algorithms
    Color transformation
    MM FP
    CIG Offline CIG Online
    Figure 7: Details of transformation
    ZHOU Jialun Online estimation of ECD and its Mixture 24 / 39

    View Slide

  25. ECD
    Mixture of ECD
    Mixture of ECD
    Density function
    f(x; θ) =
    K
    k=1
    wk
    p(x; µk
    , Σk
    , βk
    ) with
    K
    k=1
    wk
    = 1 (15)
    Parameter and Parametric space
    θ =




    [w1
    , · · · , wK
    ] ,




    µ(1)
    1
    · · · µ(1)
    K
    .
    .
    .
    ...
    .
    .
    .
    µ(m)
    1
    · · · µ(m)
    K




    ,




    Σ(1)
    1
    · · · Σ(1)
    K
    .
    .
    .
    ...
    .
    .
    .
    Σ(d)
    1
    · · · Σ(d)
    K




    , [β1
    , · · · , βK
    ]




    (16)
    Θ = (0, 1)K × Rm×K × PK
    m
    × RK
    +
    (17)
    ZHOU Jialun Online estimation of ECD and its Mixture 25 / 39

    View Slide

  26. ECD
    Mixture of ECD
    Unit sphere
    Geometry structures on unit sphere
    Cost function
    D(θ) = E q(θ; x)
    = −E log f(θ; x)
    (18)
    ZHOU Jialun Online estimation of ECD and its Mixture 26 / 39

    View Slide

  27. ECD
    Mixture of ECD
    CIM for MECD
    CIM for MECD

    , vθ θ
    = ur
    , vr
    +
    K
    k=1
    uµk
    , vµk µk
    + uΣk
    , vΣk Σk
    + uβk
    , vβk βk
    (19)
    ZHOU Jialun Online estimation of ECD and its Mixture 27 / 39

    View Slide

  28. ECD
    Mixture of ECD
    CIG with decreasing step-size
    CIG with decreasing step-size
    1 η(n+1) ← a
    n+1
    ;
    2 Update weights
    Update r(n+1) with η(n+1);
    3 Update (µk
    , Σk
    , βk
    )k
    :
    Update µ(n+1)
    k
    according to (r(n+1), µ(n)
    k
    , Σ(n)
    k
    , β(n)
    k
    )
    Update Σ(n+1)
    k
    according to (r(n+1), µ(n+1)
    k
    , Σ(n)
    k
    , β(n)
    k
    )
    Update β(n+1)
    k
    according to (r(n+1), µ(n+1)
    k
    , Σ(n+1)
    k
    , β(n)
    k
    )
    ZHOU Jialun Online estimation of ECD and its Mixture 28 / 39

    View Slide

  29. ECD
    Mixture of ECD
    Variant step size
    Convergence rate
    If ∀a > 1

    , the mean square rate holds.
    Online backtracking step size
    ∀n, η(n) is selected according to a mini-batch.
    Adaptive step size
    ∀n, η(n) = τ(n)
    min
    ρ Lτ(n)
    max
    ZHOU Jialun Online estimation of ECD and its Mixture 29 / 39

    View Slide

  30. ECD
    Mixture of ECD
    Simulation MECD
    Percentage of ’correct’ estimates
    correct estimates
    EM-FP 83%
    SG 77%
    CIG-DS 90%
    CIG-OB 89%
    CIG-AS 90%
    ZHOU Jialun Online estimation of ECD and its Mixture 30 / 39

    View Slide

  31. ECD
    Mixture of ECD
    Simulation MECD
    (a) mixture of MGGD (b) mixture of Student-T
    Figure 8: Convergence rate11
    11DS means Decreasing Stepsize and AS means Adaptive Stepsize
    ZHOU Jialun Online estimation of ECD and its Mixture 31 / 39

    View Slide

  32. ECD
    Mixture of ECD
    Comparison of efficiency
    (a) mixture of MGGD (b) mixture of Student-T
    Figure 9: Log-likelihood vs iterations
    ZHOU Jialun Online estimation of ECD and its Mixture 32 / 39

    View Slide

  33. ECD
    Mixture of ECD
    Comparison of efficiency
    (a) mixture of MGGD (b) mixture of Student-T
    Figure 10: Log-likelihood vs time consumption
    ZHOU Jialun Online estimation of ECD and its Mixture 33 / 39

    View Slide

  34. ECD
    Mixture of ECD
    Texture segmentation
    (a) Original Texture (b) EM-FP (c) SG
    (d) CIG-DS (e) CIG-OB (f) CIG-AS
    Figure 11: Visual results
    ZHOU Jialun Online estimation of ECD and its Mixture 34 / 39

    View Slide

  35. ECD
    Mixture of ECD
    Texture segmentation
    Accuracy of segmentation
    correct estimates
    EM-FP 93.011%
    SG 63.328%
    CIG-DS 96.072%
    CIG-OB 85.338%
    CIG-AS 92.121%
    ZHOU Jialun Online estimation of ECD and its Mixture 35 / 39

    View Slide

  36. ECD
    Mixture of ECD
    ss
    Thank you for your attention.
    ZHOU Jialun Online estimation of ECD and its Mixture 36 / 39

    View Slide

  37. ECD
    Mixture of ECD
    Appendix
    Generator function
    The generator function in (3) is specifically given as
    g (x − µ)†Σ−1(x − µ), β = exp −
    1
    2
    (x − µ)†Σ−1(x − µ) β for MGGD
    (20)
    g (x − µ)†Σ−1(x − µ), β = 1 +
    (x − µ)†Σ−1(x − µ)
    β

    β+m
    2
    for Student-T
    (21)
    ZHOU Jialun Online estimation of ECD and its Mixture 37 / 39

    View Slide

  38. ECD
    Mixture of ECD
    Appendix
    Component-wise Information Distance
    d2(θn
    , θ∗) = d2(µn
    , µ∗) + d2(Σn
    , Σ∗) + d2(βn
    , β∗) (22)
    where
    d2(µn
    , µ∗) = Iµ
    µn
    − µ∗ 2 (23a)
    d2(Σn
    , Σ∗) = I1
    tr log Σ−1
    n
    Σ∗ 2
    + I2
    tr2 log Σ−1
    n
    Σ∗ (23b)
    d2(βn
    , β∗) = Iβ
    log2(β−1
    n
    β∗) (23c)
    This distance is use for measuring the errors in Figure 1, 2 and 4.
    ZHOU Jialun Online estimation of ECD and its Mixture 38 / 39

    View Slide

  39. ECD
    Mixture of ECD
    Appendix
    Definition of O(n−1)
    ∃K > 0, ∃n0
    > 0, such that, d2(θ(n), θ∗)
    K
    n
    (24)
    Chi-2 distribution
    If a = 1, (n1
    2 θ(i))i∈1,··· ,d
    converges to a normal distribution with an identity
    covariance Id
    , and the estimates are asymptotically efficient. For θ = (µ, Σ)
    with fixed β∗
    nd2 (θ∗, θn
    ) ⇒ X2
    m(m + 1)
    2
    + m (25)
    This result is numerically verifed in Figure 3
    ZHOU Jialun Online estimation of ECD and its Mixture 39 / 39

    View Slide