Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Convex Color Image Segmentation with Optimal Transport Distances

Convex Color Image Segmentation with Optimal Transport Distances

This work concerns the histogram-based segmentation of a color image in two regions. In the considered framework, fixed exemplar histograms define a prior on the statistical features of the two regions in competition. We investigate the use of regularized transport-based cost functions as discrepancy measures between color histograms and consider a spatial regularization of the segmentation map with total variation. We finally rely on a primal-dual algorithm to solve the obtained convex optimization problem. Experiments illustrate the robustness of the proposed method for the segmentation of natural color images.

npapadakis

July 02, 2015
Tweet

More Decks by npapadakis

Other Decks in Science

Transcript

  1. Convex Color Image Segmentation with Optimal Transport Distances Nicolas Papadakis1

    & Julien Rabin2 1CNRS, Institut de Mathématiques de Bordeaux, France 2 Université de Caen, GREYC, France Transport metrics in image and shape processing IFIP TC7 Conference 2015 on System Modelling and Optimization 2nd July 2015
  2. Two-phase weakly supervised segmentation Image dataset (Partial) labeling by the

    user Scribbles indicating Foreground and Background 2 of 30
  3. Two-phase weakly supervised segmentation Image dataset (Partial) labeling by the

    user Segmentation Segmentation into regions having similar features distributions (color, texture . . . ) 2 of 30
  4. Problem statement General Assumptions (and limitations) each picture has two

    regions: foreground and background regions of interest are globally statistically consistent with corresponding scribble regions (i.e. same feature distribution) regions are composed of a very small set of connected components 3 of 30
  5. Problem statement General Assumptions (and limitations) each picture has two

    regions: foreground and background regions of interest are globally statistically consistent with corresponding scribble regions (i.e. same feature distribution) regions are composed of a very small set of connected components Advantages of considering global statistics Robustness to outliers Labeling may be incorrect Appearance may change dramatically when using prior coming from another image 3 of 30
  6. Setting & Notations Simplified setting We consider only 1 image

    I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u1 u0 Problem statement Input scribbles defines binary map u1 (foreground) and u0 (background) 4 of 30
  7. Setting & Notations Simplified setting We consider only 1 image

    I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u Problem statement Binary segmentation of I is defined by u ∈ {0, 1}N 4 of 30
  8. Setting & Notations Simplified setting We consider only 1 image

    I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u π(u) Problem statement Color probability distributions π(u) are defined as sum of dirac mass π(u) := 1 x∈Ω u(x) x∈Ω u(x)δI(x) 4 of 30
  9. Setting & Notations Simplified setting We consider only 1 image

    I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u π(u) Problem statement Color probability distributions π(u) are defined as sum of dirac mass Empirical distributions compared with a given metric S (e.g. symmetric KL divergence) 4 of 30
  10. Variational Model The unknown is u : x ∈ Ω

    → u(x) ∈ {0, 1}, π(u) and π(1 − u) are foreground/background probability distribution π(u1 ) and π(u0 ) are fixed reference color distributions 5 of 30
  11. Variational Model The unknown is u : x ∈ Ω

    → u(x) ∈ {0, 1}, π(u) and π(1 − u) are foreground/background probability distribution π(u1 ) and π(u0 ) are fixed reference color distributions Problem formulation First attempt argmin u∈{0,1}N J(u) := S(π(u), π(u1 )) + S(π(1 − u), π(u0 ))  NP hard optimization problem (non-convex constraints and energy)  no spatial regularity 5 of 30
  12. Variational Model The unknown is u : x ∈ Ω

    → u(x) ∈ {0, 1}, π(u) and π(1 − u) are foreground/background probability distribution π(u1 ) and π(u0 ) are fixed reference color distributions Problem formulation First attempt argmin u∈{0,1}N J(u) := S(π(u), π(u1 )) + S(π(1 − u), π(u0 ))  NP hard optimization problem (non-convex constraints and energy)  no spatial regularity Second attempt argmin u∈[0,1]N J(u) + R(u)  relaxed constraint u(x) ∈ [0, 1]  spatial regularity promoted by R (e.g. Potts model)  non-convex optimization problem (π(u) := 1 x∈Ω u(x) x∈Ω u(x)δI(x) ) 5 of 30
  13. Previous works A large number of methods have been proposed

    in the literature to address this problem Focus on approaches considering metrics S based on Optimal Transport: [Peyré, Fadili and Rabin ‘12] Unsupervised non convex active contour based on optimal transport metric [Yildizoglu, Aujol and P. ’13] Supervised convex fast based on Wasserstein- 1 metric for grayscale images [Swoboda & Schnörr ‘13] Unsupervised/supervised convex multi-phase model based on optimal transport metric 6 of 30
  14. Building histograms of features Example with two color histograms: H(u)

    and H(1 − u) I u foreground H(u) background H(1 − u) 7 of 30
  15. Histograms construction N colored pixels: ⇒ Define a color space

    C composed by M color bins The color I(x) at pixel x belongs to a single color bin Ci Matrix H of size M × N has only N non null values: H =        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H is a linear assignment operator: H(i, x) = 1 ⇒ I(x) ∈ Ci 8 of 30
  16. Histogram-based data term Use H to build the histogram of

    I on a specific area: Full image I × 1( Corresponding histogram        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H ×        1 1 . . . 1 1        1( =        n1 n2 . . . nM−1 nM        h1( = ni is the number of pixels activated by 1 and belonging to Ci 9 of 30
  17. Histogram-based data term Use H to build the histogram of

    I on a specific area: Object I × u( Corresponding histogram        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H ×        0 1 . . . 1 0        u( =        n1 n2 . . . nM−1 nM        hu ( = ni is the number of pixels activated by u and belonging to Ci 9 of 30
  18. Histogram-based data term Use H to build the histogram of

    I on a specific area: Background I × (1 − u)( Corresponding histogram        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H ×        1 0 . . . 0 1        1−u( =        n1 n2 . . . nM−1 nM        h1−u ( = ni is the number of pixels activated by 1 − u and belonging to Ci 9 of 30
  19. Convex Variational Model Non Convex model argmin u∈[0,1]N S(π(u), π(u1

    )) + S(π(1 − u), π(u0 )) + R(u) We have π(u) = 1 x∈Ω u(x) x∈Ω u(x)δI(x) = Hu u, 1 , 10 of 30
  20. Convex Variational Model Non Convex model argmin u∈[0,1]N S(π(u), π(u1

    )) + S(π(1 − u), π(u0 )) + R(u) We have π(u) = 1 x∈Ω u(x) x∈Ω u(x)δI(x) = Hu u, 1 , Convex model [Yildizoglu et al. ’13, Swoboda & Schnörr ‘13] argmin u∈[0,1]N S(Hu, u, 1 π(u1 )) + S(H(1 − u), 1 − u, 1 π(u0 )) + R(u) 10 of 30
  21. Convex Variational Model Convex model argmin u∈∆ S(Hu, u, 1

    π(u1 )) + S(H(1 − u), 1 − u, 1 π(u0 )) + ρ TV (u)  ∆ := [0, 1]N  Hu and H(1 − u) are unnormalized histograms  π(u0 ) and π(u1 ) are quantized reference distributions  u, 1 = x∈Ω u(x) is a normalization for reference histogram π(u1 )  TV (u) = ||Du||1,2 is the discrete, isotropic total variation, “Chan and Vese”-like model with color histograms instead of mean colors 11 of 30
  22. Regularity using Total Variation Discrete Formulation of total variation is

    1,2 norm of discrete gradient TV (u) = ||Du||1,2 = x∈Ω ||Du(x)||2 where Du(i, j) = u(i, j) − u(i − 1, j) u(i, j) − u(i, j − 1) Property Related to the length of the perimeters of level-sets Example with increasing TV parameter values: TV : < < < 12 of 30
  23. Histograms comparison with metric S [Rother et al. ‘06, Vicente

    et al. ’10,Yildizoglu et al. ’13]: S is a bin-to-bin distances ( 2 , 1 , Bhattacharyya, Mahalanobis, Kullback-Leibler...)  Fast and easy to optimize (graph methods can be considered)  Works with unnormalized histograms 13 of 30
  24. Histograms comparison with metric S [Rother et al. ‘06, Vicente

    et al. ’10,Yildizoglu et al. ’13]: S is a bin-to-bin distances ( 2 , 1 , Bhattacharyya, Mahalanobis, Kullback-Leibler...)  Fast and easy to optimize (graph methods can be considered)  Works with unnormalized histograms  Not robust to perturbations (quantization, shifts, etc) With bin-to-bin distances S: S(A, B) = S(C, B) 13 of 30
  25. Histograms comparison with metric S [Rother et al. ‘06, Vicente

    et al. ’10,Yildizoglu et al. ’13]: S is a bin-to-bin distances ( 2 , 1 , Bhattacharyya, Mahalanobis, Kullback-Leibler...)  Fast and easy to optimize (graph methods can be considered)  Works with unnormalized histograms  Not robust to perturbations (quantization, shifts, etc) With bin-to-bin distances S: S(A, B) = S(C, B) [Peyré et al. ’12, Swoboda & Schnörr ‘13] Optimal transport cost  Measure the cost for transporting one histogram to another  Robust to large perturbations  Much more difficult to optimize (problem of gigantic size with graphs)  Does not work with unnormalized histograms 13 of 30
  26. Optimal transport Transport cost between normalized histograms µ and ν

    µ = M i=1 mi δXi , ν = P j=1 nj δYj , mi , nj ≥ 0 are the masses at locations Xi , Yj , i mi = j nj = 1 Monge-Kantorovitch optimal transport distance MK(µ, ν) = min P∈Pµ,ν    P , C = i,j Pi,j Xi − Yj 2    = min L·p=b p≥0 pT c • C is the fixed cost assignment matrix between histograms bins: Ci,j = ||Xi − Yj ||2 (quadratic Wasserstein distance) • Pµ,ν is the set of non negative matrices P with marginals µ and ν, i.e. Pµ,ν =    P ∈ RM×N + , i,j Pi,j = 1, j Pi,j = mi , i Pi,j = nj    • Pi,j is the proportion of mass mi at Xi that is send to Yj 14 of 30
  27. Optimal transport solution illustration in 1D Histograms µ and ν

    (on uniform grid) Optimal flow P 15 of 30
  28. Optimal transport solution illustration in 1D Histograms µ and ν

    (on uniform grid) Optimal flow P 15 of 30
  29. Optimization problem Primal formulation u ∈ argmin u∈∆ MK Hu,

    H1 u + MK H(1 − u), H0 (1 − u) + ρ||Du||1,2 where H0, H1 are linear operators (matrices N × M of rank 1) s.t. H1 u = π(u1 )1T u = u, 1 π(u1 ) 16 of 30
  30. Optimization problem Primal formulation u ∈ argmin u∈∆ MK Hu,

    H1 u + MK H(1 − u), H0 (1 − u) + ρ||Du||1,2 where H0, H1 are linear operators (matrices N × M of rank 1) s.t. H1 u = π(u1 )1T u = u, 1 π(u1 ) Generic formulation and resolution with proximal splitting algorithms such as GFBS [Raguet et al. ‘11] as done in [Swoboda and Schn´ ’or ‘13] argmin x i Gi (Ai x)  Simple iterative algorithm using ProxGi ◦Ai  Requires to solve at each iteration a quadratic programming problem Prox MK( ˜ H·) (x) = argmin y∈∆ ||x − y||2 + MK(˜ Hy) 16 of 30
  31. Optimization problem Primal formulation u ∈ argmin u∈∆ MK Hu,

    H1 u + MK H(1 − u), H0 (1 − u) + ρ||Du||1,2 where H0, H1 are linear operators (matrices N × M of rank 1) s.t. H1 u = π(u1 )1T u = u, 1 π(u1 ) Generic formulation and resolution with proximal splitting algorithms such as Primal-Dual [Chambolle and Pock‘11] using Fenchel transform argmin x i Gi (Ai x) = argmin x argmax zi i Ai x, zi − G∗ i (zi )  Simple iterative algorithm using ProxG∗ i  Requires to solve at each iteration a quadratic programming problem ProxMK∗(·) (x) = argmin y∈∆ ||x − y||2 + MK∗(y) 16 of 30
  32. Sinkhorn distance Entropic regularization of optimal transport [Cuturi et al.

    ‘13] MKλ (µ, ν) := min P∈Pµ,ν P, C + 1 λ P, log P = 1 λ KL(P|e−λC ) where P, log P is the negative entropy ⇒ strongly convex problem Properties  Regularization of the optimal transport, MKλ → MK with λ → ∞  Simple and fast algorithm to solve MKλ [Sinkhorn ‘60s]  Simple dual formulation [Cuturi et al. ’15]  Less sparse solutions [Solomon et al. ’15] [P. & Rabin’15] 17 of 30
  33. Adaptation to convex segmentation problem Normalization Due to relaxation, histograms

    hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Remark: For segmentation, we have m = N the number of pixels 18 of 30
  34. Adaptation to convex segmentation problem Normalization Due to relaxation, histograms

    hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Dual formulation Legendre-Fenchel transform of a convex function F defined on X is F∗(y) = sup x∈X x, y − F(x) and we have F∗∗ = F 18 of 30
  35. Adaptation to convex segmentation problem Normalization Due to relaxation, histograms

    hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Dual formulation [Rabin & P. ‘15], for γ = (α, β) MK∗ λ,m (γ) = m λ    Qλ (γ), 1 if Qλ (γ), 1 1 log Qλ (γ), 1 + 1 if Qλ (γ), 1 1 Qλ (γ)i,j := eλ(αi +βj −Ci,j )−1 Remark: We recover the special case m = 1 of [Cuturi et al. ’15] 18 of 30
  36. Adaptation to convex segmentation problem Normalization Due to relaxation, histograms

    hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Dual formulation [Rabin & P. ‘15], for γ = (α, β) ∇MK∗ λ,m (γ) =    m Q1, Q 1 if Q, 1 1 m Q,1 Q1, Q 1 if Q, 1 1 Qi,j = eλ(αi +βj −Ci,j )−1 which is Lipschitz continuous, with constant upperbounded by 2λm (worst case scenario). 18 of 30
  37. Proposed segmentation optimization problem Primal formulation u ∈ argmin u∈∆

    MKλ,N (Hu, H1 u) + MKλ,N (H(1 − u), H0 (1 − u)) + ρ||Du||1,2 19 of 30
  38. Proposed segmentation optimization problem Primal formulation u ∈ argmin u∈∆

    MKλ,N (Hu, H1 u) + MKλ,N (H(1 − u), H0 (1 − u)) + ρ||Du||1,2 Equivalent Primal-Dual formulation after splitting for the TV term argmin u max v Ku, v + ι∆ (u) − F∗(v) − G∗(v) where • v = (γ0, γ1, w) ∈ R4M+2N is the dual variable; • K = [D , −H , H0 , H , −H1 ] ∈ RN×(2N+4M); • ι∆ is the indicator of ∆ = [0, 1]N ; • G∗(v) = ι| |.| |∞,2 ≤ρ (w) (dual formulation of 1,2 norm) is simple • F∗(v) = MK∗ λ,N (γ1) + MK∗ λ,N (γ0) − H1, α0 + H0 1, β0 is differentiable • ιHN is absorbed in the dual formulation of MKλ,N ! 19 of 30
  39. Algorithm Use of the inertial preconditioned primal-dual algorithm of [Lorentz

    & Pock ‘15] (here shown with only one parameter r > 0, other being set to 1)        u(t+1) = Proj∆ u(t) − TK v(t) ˜ u(t+1) = 2u(t+1) − u(t) v(t+1) = ProxG∗ v(t) + Σ(K˜ u(t) − ∇F∗(v(t))) where T = τ IdN and Σ = diag (σw , σh ) are two diagonal pre-conditionning matrices 1 τ = 8r, 1 σv = 2 r · 12N , et 1 σh = 2λN · 14M + 1 r     H H0 H H1     1N 20 of 30
  40. Limitations and Alternatives Advantages & limitations  Regularized measure MKλ

    is robust to outliers  Exploits the smoothness of the regularization,  No need for optimal transport computation  Memory storage: primal variable (size N), dual variables (size M).  Not interesting when λ → ∞ as the Lipschitz constant is ≤ 2λN 21 of 30
  41. Limitations and Alternatives Advantages & limitations  Regularized measure MKλ

    is robust to outliers  Exploits the smoothness of the regularization,  No need for optimal transport computation  Memory storage: primal variable (size N), dual variables (size M).  Not interesting when λ → ∞ as the Lipschitz constant is ≤ 2λN In this case, we can rely to bi-conjugation  The proximal operators can be computed simply (either for MK or MKλ )  The dual variables are of size M2: limited to coarse color clustering 21 of 30
  42. Experiments: comparison of various metrics Accurate description of the background

    prior I, u1 , u0 1 Wasserstein Regularized Wasserstein Results with MK distance (i.e. 1 = 0) 22 of 30
  43. Experiments: comparison of various metrics Same parameters: other background prior

    I, u1 , u0 1 Wasserstein Regularized Wasserstein Results with MK distance (i.e. 1 = 0) 22 of 30
  44. Other Wasserstein segmentation methods I, u1 , u0 Local [Ni

    et al. ‘09] Global Wasserstein active contours [Peyré et al. ‘12]: Initialization Result 23 of 30
  45. Influence of the regularization MKλ (h1, h2) = min P∈P(h1,h2)

    C, P + 1 λ P, log P/N I, u1 , u0 Wasserstein Regularized Wasserstein λ = 100 24 of 30
  46. Influence of the regularization MKλ (h1, h2) = min P∈P(h1,h2)

    C, P + 1 λ P, log P/N I, u1 , u0 Wasserstein Regularized Wasserstein λ = 10 24 of 30
  47. Extension to Co-segmentation Find the same object in two different

    images (no prior on the object): J(u) := MKλ H1u1, H2u2 + ρ 2 k=1 ||Duk ||1,2 − α||uk ||1 Balloon force −δ||uk ||1 to prevent from estimating empty areas u1 u2 Problem: Deal with change of object scale in different images 28 of 30
  48. Conclusion and future work Contributions Regularized optimal transport based metric

    for image segmentation Fast and simple iterative algorithm enabling unnormalized histograms Future work Natural extension to multiple phase segmentation (see [Swoboda & Schnörr ‘13]) Cosegmentation Unbalanced histograms using mass conservation relaxation ([Ferradans et al. ‘13, Benamou et al. ‘15]) 29 of 30