Convex Color Image Segmentation with Optimal Transport Distances

Convex Color Image Segmentation with Optimal Transport Distances Nicolas Papadakis1
& Julien Rabin2 1CNRS, Institut de Mathématiques de Bordeaux, France 2 Université de Caen, GREYC, France Transport metrics in image and shape processing IFIP TC7 Conference 2015 on System Modelling and Optimization 2nd July 2015

Two-phase weakly supervised segmentation Image dataset 2 of 30

Two-phase weakly supervised segmentation Image dataset (Partial) labeling by the
user Scribbles indicating Foreground and Background 2 of 30

Two-phase weakly supervised segmentation Image dataset (Partial) labeling by the
user Segmentation Segmentation into regions having similar features distributions (color, texture . . . ) 2 of 30

Problem statement General Assumptions (and limitations) each picture has two
regions: foreground and background regions of interest are globally statistically consistent with corresponding scribble regions (i.e. same feature distribution) regions are composed of a very small set of connected components 3 of 30

Problem statement General Assumptions (and limitations) each picture has two
regions: foreground and background regions of interest are globally statistically consistent with corresponding scribble regions (i.e. same feature distribution) regions are composed of a very small set of connected components Advantages of considering global statistics Robustness to outliers Labeling may be incorrect Appearance may change dramatically when using prior coming from another image 3 of 30

Setting & Notations Simpliﬁed setting We consider only 1 image
I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u1 u0 Problem statement Input scribbles deﬁnes binary map u1 (foreground) and u0 (background) 4 of 30

I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u Problem statement Binary segmentation of I is deﬁned by u ∈ {0, 1}N 4 of 30

I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u π(u) Problem statement Color probability distributions π(u) are deﬁned as sum of dirac mass π(u) := 1 x∈Ω u(x) x∈Ω u(x)δI(x) 4 of 30

I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u π(u) Problem statement Color probability distributions π(u) are deﬁned as sum of dirac mass Empirical distributions compared with a given metric S (e.g. symmetric KL divergence) 4 of 30

Variational Model The unknown is u : x ∈ Ω
→ u(x) ∈ {0, 1}, π(u) and π(1 − u) are foreground/background probability distribution π(u1 ) and π(u0 ) are ﬁxed reference color distributions 5 of 30

→ u(x) ∈ {0, 1}, π(u) and π(1 − u) are foreground/background probability distribution π(u1 ) and π(u0 ) are ﬁxed reference color distributions Problem formulation First attempt argmin u∈{0,1}N J(u) := S(π(u), π(u1 )) + S(π(1 − u), π(u0 )) NP hard optimization problem (non-convex constraints and energy) no spatial regularity 5 of 30

→ u(x) ∈ {0, 1}, π(u) and π(1 − u) are foreground/background probability distribution π(u1 ) and π(u0 ) are ﬁxed reference color distributions Problem formulation First attempt argmin u∈{0,1}N J(u) := S(π(u), π(u1 )) + S(π(1 − u), π(u0 )) NP hard optimization problem (non-convex constraints and energy) no spatial regularity Second attempt argmin u∈[0,1]N J(u) + R(u) relaxed constraint u(x) ∈ [0, 1] spatial regularity promoted by R (e.g. Potts model) non-convex optimization problem (π(u) := 1 x∈Ω u(x) x∈Ω u(x)δI(x) ) 5 of 30

Previous works A large number of methods have been proposed
in the literature to address this problem Focus on approaches considering metrics S based on Optimal Transport: [Peyré, Fadili and Rabin ‘12] Unsupervised non convex active contour based on optimal transport metric [Yildizoglu, Aujol and P. ’13] Supervised convex fast based on Wasserstein- 1 metric for grayscale images [Swoboda & Schnörr ‘13] Unsupervised/supervised convex multi-phase model based on optimal transport metric 6 of 30

Building histograms of features Example with two color histograms: H(u)
and H(1 − u) I u foreground H(u) background H(1 − u) 7 of 30

Histograms construction N colored pixels: ⇒ Deﬁne a color space
C composed by M color bins The color I(x) at pixel x belongs to a single color bin Ci Matrix H of size M × N has only N non null values: H =        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H is a linear assignment operator: H(i, x) = 1 ⇒ I(x) ∈ Ci 8 of 30

Histogram-based data term Use H to build the histogram of
I on a speciﬁc area: Full image I × 1( Corresponding histogram        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H ×        1 1 . . . 1 1        1( =        n1 n2 . . . nM−1 nM        h1( = ni is the number of pixels activated by 1 and belonging to Ci 9 of 30

I on a speciﬁc area: Object I × u( Corresponding histogram        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H ×        0 1 . . . 1 0        u( =        n1 n2 . . . nM−1 nM        hu ( = ni is the number of pixels activated by u and belonging to Ci 9 of 30

I on a speciﬁc area: Background I × (1 − u)( Corresponding histogram        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H ×        1 0 . . . 0 1        1−u( =        n1 n2 . . . nM−1 nM        h1−u ( = ni is the number of pixels activated by 1 − u and belonging to Ci 9 of 30

Convex Variational Model Non Convex model argmin u∈[0,1]N S(π(u), π(u1
)) + S(π(1 − u), π(u0 )) + R(u) We have π(u) = 1 x∈Ω u(x) x∈Ω u(x)δI(x) = Hu u, 1 , 10 of 30

Convex Variational Model Non Convex model argmin u∈[0,1]N S(π(u), π(u1
)) + S(π(1 − u), π(u0 )) + R(u) We have π(u) = 1 x∈Ω u(x) x∈Ω u(x)δI(x) = Hu u, 1 , Convex model [Yildizoglu et al. ’13, Swoboda & Schnörr ‘13] argmin u∈[0,1]N S(Hu, u, 1 π(u1 )) + S(H(1 − u), 1 − u, 1 π(u0 )) + R(u) 10 of 30

Convex Variational Model Convex model argmin u∈∆ S(Hu, u, 1
π(u1 )) + S(H(1 − u), 1 − u, 1 π(u0 )) + ρ TV (u) ∆ := [0, 1]N Hu and H(1 − u) are unnormalized histograms π(u0 ) and π(u1 ) are quantized reference distributions u, 1 = x∈Ω u(x) is a normalization for reference histogram π(u1 ) TV (u) = ||Du||1,2 is the discrete, isotropic total variation, “Chan and Vese”-like model with color histograms instead of mean colors 11 of 30

Regularity using Total Variation Discrete Formulation of total variation is
1,2 norm of discrete gradient TV (u) = ||Du||1,2 = x∈Ω ||Du(x)||2 where Du(i, j) = u(i, j) − u(i − 1, j) u(i, j) − u(i, j − 1) Property Related to the length of the perimeters of level-sets Example with increasing TV parameter values: TV : < < < 12 of 30

Histograms comparison with metric S [Rother et al. ‘06, Vicente
et al. ’10,Yildizoglu et al. ’13]: S is a bin-to-bin distances ( 2 , 1 , Bhattacharyya, Mahalanobis, Kullback-Leibler...) Fast and easy to optimize (graph methods can be considered) Works with unnormalized histograms 13 of 30

et al. ’10,Yildizoglu et al. ’13]: S is a bin-to-bin distances ( 2 , 1 , Bhattacharyya, Mahalanobis, Kullback-Leibler...) Fast and easy to optimize (graph methods can be considered) Works with unnormalized histograms Not robust to perturbations (quantization, shifts, etc) With bin-to-bin distances S: S(A, B) = S(C, B) 13 of 30

et al. ’10,Yildizoglu et al. ’13]: S is a bin-to-bin distances ( 2 , 1 , Bhattacharyya, Mahalanobis, Kullback-Leibler...) Fast and easy to optimize (graph methods can be considered) Works with unnormalized histograms Not robust to perturbations (quantization, shifts, etc) With bin-to-bin distances S: S(A, B) = S(C, B) [Peyré et al. ’12, Swoboda & Schnörr ‘13] Optimal transport cost Measure the cost for transporting one histogram to another Robust to large perturbations Much more diﬃcult to optimize (problem of gigantic size with graphs) Does not work with unnormalized histograms 13 of 30

Optimal transport Transport cost between normalized histograms µ and ν
µ = M i=1 mi δXi , ν = P j=1 nj δYj , mi , nj ≥ 0 are the masses at locations Xi , Yj , i mi = j nj = 1 Monge-Kantorovitch optimal transport distance MK(µ, ν) = min P∈Pµ,ν    P , C = i,j Pi,j Xi − Yj 2    = min L·p=b p≥0 pT c • C is the ﬁxed cost assignment matrix between histograms bins: Ci,j = ||Xi − Yj ||2 (quadratic Wasserstein distance) • Pµ,ν is the set of non negative matrices P with marginals µ and ν, i.e. Pµ,ν =    P ∈ RM×N + , i,j Pi,j = 1, j Pi,j = mi , i Pi,j = nj    • Pi,j is the proportion of mass mi at Xi that is send to Yj 14 of 30

Optimal transport solution illustration in 1D Histograms µ and ν
(on uniform grid) Optimal ﬂow P 15 of 30

Optimization problem Primal formulation u ∈ argmin u∈∆ MK Hu,
H1 u + MK H(1 − u), H0 (1 − u) + ρ||Du||1,2 where H0, H1 are linear operators (matrices N × M of rank 1) s.t. H1 u = π(u1 )1T u = u, 1 π(u1 ) 16 of 30

H1 u + MK H(1 − u), H0 (1 − u) + ρ||Du||1,2 where H0, H1 are linear operators (matrices N × M of rank 1) s.t. H1 u = π(u1 )1T u = u, 1 π(u1 ) Generic formulation and resolution with proximal splitting algorithms such as GFBS [Raguet et al. ‘11] as done in [Swoboda and Schn´ ’or ‘13] argmin x i Gi (Ai x) Simple iterative algorithm using ProxGi ◦Ai Requires to solve at each iteration a quadratic programming problem Prox MK( ˜ H·) (x) = argmin y∈∆ ||x − y||2 + MK(˜ Hy) 16 of 30

H1 u + MK H(1 − u), H0 (1 − u) + ρ||Du||1,2 where H0, H1 are linear operators (matrices N × M of rank 1) s.t. H1 u = π(u1 )1T u = u, 1 π(u1 ) Generic formulation and resolution with proximal splitting algorithms such as Primal-Dual [Chambolle and Pock‘11] using Fenchel transform argmin x i Gi (Ai x) = argmin x argmax zi i Ai x, zi − G∗ i (zi ) Simple iterative algorithm using ProxG∗ i Requires to solve at each iteration a quadratic programming problem ProxMK∗(·) (x) = argmin y∈∆ ||x − y||2 + MK∗(y) 16 of 30

Sinkhorn distance Entropic regularization of optimal transport [Cuturi et al.
‘13] MKλ (µ, ν) := min P∈Pµ,ν P, C + 1 λ P, log P = 1 λ KL(P|e−λC ) where P, log P is the negative entropy ⇒ strongly convex problem Properties Regularization of the optimal transport, MKλ → MK with λ → ∞ Simple and fast algorithm to solve MKλ [Sinkhorn ‘60s] Simple dual formulation [Cuturi et al. ’15] Less sparse solutions [Solomon et al. ’15] [P. & Rabin’15] 17 of 30

Adaptation to convex segmentation problem Normalization Due to relaxation, histograms
hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Remark: For segmentation, we have m = N the number of pixels 18 of 30

hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Dual formulation Legendre-Fenchel transform of a convex function F deﬁned on X is F∗(y) = sup x∈X x, y − F(x) and we have F∗∗ = F 18 of 30

hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Dual formulation [Rabin & P. ‘15], for γ = (α, β) MK∗ λ,m (γ) = m λ    Qλ (γ), 1 if Qλ (γ), 1 1 log Qλ (γ), 1 + 1 if Qλ (γ), 1 1 Qλ (γ)i,j := eλ(αi +βj −Ci,j )−1 Remark: We recover the special case m = 1 of [Cuturi et al. ’15] 18 of 30

hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Dual formulation [Rabin & P. ‘15], for γ = (α, β) ∇MK∗ λ,m (γ) =    m Q1, Q 1 if Q, 1 1 m Q,1 Q1, Q 1 if Q, 1 1 Qi,j = eλ(αi +βj −Ci,j )−1 which is Lipschitz continuous, with constant upperbounded by 2λm (worst case scenario). 18 of 30

Proposed segmentation optimization problem Primal formulation u ∈ argmin u∈∆
MKλ,N (Hu, H1 u) + MKλ,N (H(1 − u), H0 (1 − u)) + ρ||Du||1,2 19 of 30

Proposed segmentation optimization problem Primal formulation u ∈ argmin u∈∆
MKλ,N (Hu, H1 u) + MKλ,N (H(1 − u), H0 (1 − u)) + ρ||Du||1,2 Equivalent Primal-Dual formulation after splitting for the TV term argmin u max v Ku, v + ι∆ (u) − F∗(v) − G∗(v) where • v = (γ0, γ1, w) ∈ R4M+2N is the dual variable; • K = [D , −H , H0 , H , −H1 ] ∈ RN×(2N+4M); • ι∆ is the indicator of ∆ = [0, 1]N ; • G∗(v) = ι| |.| |∞,2 ≤ρ (w) (dual formulation of 1,2 norm) is simple • F∗(v) = MK∗ λ,N (γ1) + MK∗ λ,N (γ0) − H1, α0 + H0 1, β0 is diﬀerentiable • ιHN is absorbed in the dual formulation of MKλ,N ! 19 of 30

Algorithm Use of the inertial preconditioned primal-dual algorithm of [Lorentz
& Pock ‘15] (here shown with only one parameter r > 0, other being set to 1)        u(t+1) = Proj∆ u(t) − TK v(t) ˜ u(t+1) = 2u(t+1) − u(t) v(t+1) = ProxG∗ v(t) + Σ(K˜ u(t) − ∇F∗(v(t))) where T = τ IdN and Σ = diag (σw , σh ) are two diagonal pre-conditionning matrices 1 τ = 8r, 1 σv = 2 r · 12N , et 1 σh = 2λN · 14M + 1 r     H H0 H H1     1N 20 of 30

Limitations and Alternatives Advantages & limitations Regularized measure MKλ
is robust to outliers Exploits the smoothness of the regularization, No need for optimal transport computation Memory storage: primal variable (size N), dual variables (size M). Not interesting when λ → ∞ as the Lipschitz constant is ≤ 2λN 21 of 30

Limitations and Alternatives Advantages & limitations Regularized measure MKλ
is robust to outliers Exploits the smoothness of the regularization, No need for optimal transport computation Memory storage: primal variable (size N), dual variables (size M). Not interesting when λ → ∞ as the Lipschitz constant is ≤ 2λN In this case, we can rely to bi-conjugation The proximal operators can be computed simply (either for MK or MKλ ) The dual variables are of size M2: limited to coarse color clustering 21 of 30

Experiments: comparison of various metrics Accurate description of the background
prior I, u1 , u0 1 Wasserstein Regularized Wasserstein Results with MK distance (i.e. 1 = 0) 22 of 30

Experiments: comparison of various metrics Same parameters: other background prior
I, u1 , u0 1 Wasserstein Regularized Wasserstein Results with MK distance (i.e. 1 = 0) 22 of 30

Other Wasserstein segmentation methods I, u1 , u0 Local [Ni
et al. ‘09] Global Wasserstein active contours [Peyré et al. ‘12]: Initialization Result 23 of 30

Inﬂuence of the regularization MKλ (h1, h2) = min P∈P(h1,h2)
C, P + 1 λ P, log P/N I, u1 , u0 Wasserstein Regularized Wasserstein λ = 100 24 of 30

Inﬂuence of the regularization MKλ (h1, h2) = min P∈P(h1,h2)
C, P + 1 λ P, log P/N I, u1 , u0 Wasserstein Regularized Wasserstein λ = 10 24 of 30

Inﬂuence of the ﬁnal binarization I, u1 , u0 u∗
u∗ > 0.9 25 of 30

u∗ > 0.5 25 of 30

u∗ > 0.2 25 of 30

u∗ > 0.1 25 of 30

Experiments: Partial labeled image Use background prior for similar images
I, u1 , u0 26 of 30

Experiments: Texture segmentation Input Result Texture segmentation using histograms of
gradient norms. 27 of 30

Extension to Co-segmentation Find the same object in two diﬀerent
images (no prior on the object): J(u) := MKλ H1u1, H2u2 + ρ 2 k=1 ||Duk ||1,2 − α||uk ||1 Balloon force −δ||uk ||1 to prevent from estimating empty areas u1 u2 Problem: Deal with change of object scale in diﬀerent images 28 of 30

Conclusion and future work Contributions Regularized optimal transport based metric
for image segmentation Fast and simple iterative algorithm enabling unnormalized histograms Future work Natural extension to multiple phase segmentation (see [Swoboda & Schnörr ‘13]) Cosegmentation Unbalanced histograms using mass conservation relaxation ([Ferradans et al. ‘13, Benamou et al. ‘15]) 29 of 30

Question time Thank you for your attention ! 30 of
30

Convex Color Image Segmentation with Optimal Tr...

Convex Color Image Segmentation with Optimal Transport Distances

More Decks by npapadakis

Other Decks in Science

Featured

Transcript