Convex Color Image Segmentation with Optimal Transport Distances

Slide 1

Slide 1 text

Convex Color Image Segmentation with Optimal Transport Distances Nicolas Papadakis1 & Julien Rabin2 1CNRS, Institut de Mathématiques de Bordeaux, France 2 Université de Caen, GREYC, France Transport metrics in image and shape processing IFIP TC7 Conference 2015 on System Modelling and Optimization 2nd July 2015

Slide 2

Slide 2 text

Two-phase weakly supervised segmentation Image dataset 2 of 30

Slide 3

Slide 3 text

Two-phase weakly supervised segmentation Image dataset (Partial) labeling by the user Scribbles indicating Foreground and Background 2 of 30

Slide 4

Slide 4 text

Two-phase weakly supervised segmentation Image dataset (Partial) labeling by the user Segmentation Segmentation into regions having similar features distributions (color, texture . . . ) 2 of 30

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Problem statement General Assumptions (and limitations) each picture has two regions: foreground and background regions of interest are globally statistically consistent with corresponding scribble regions (i.e. same feature distribution) regions are composed of a very small set of connected components Advantages of considering global statistics Robustness to outliers Labeling may be incorrect Appearance may change dramatically when using prior coming from another image 3 of 30

Slide 7

Slide 7 text

Setting & Notations Simpliﬁed setting We consider only 1 image I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u1 u0 Problem statement Input scribbles deﬁnes binary map u1 (foreground) and u0 (background) 4 of 30

Slide 8

Slide 8 text

Setting & Notations Simpliﬁed setting We consider only 1 image I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u Problem statement Binary segmentation of I is deﬁned by u ∈ {0, 1}N 4 of 30

Slide 9

Slide 9 text

Setting & Notations Simpliﬁed setting We consider only 1 image I with two binary labels u1 and u0 Color image I : Ω ⊂ Z2 → R3 with N = |Ω| pixels Segmentation features: RGB colors I u π(u) Problem statement Color probability distributions π(u) are deﬁned as sum of dirac mass π(u) := 1 x∈Ω u(x) x∈Ω u(x)δI(x) 4 of 30

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Variational Model The unknown is u : x ∈ Ω → u(x) ∈ {0, 1}, π(u) and π(1 − u) are foreground/background probability distribution π(u1 ) and π(u0 ) are ﬁxed reference color distributions Problem formulation First attempt argmin u∈{0,1}N J(u) := S(π(u), π(u1 )) + S(π(1 − u), π(u0 )) NP hard optimization problem (non-convex constraints and energy) no spatial regularity Second attempt argmin u∈[0,1]N J(u) + R(u) relaxed constraint u(x) ∈ [0, 1] spatial regularity promoted by R (e.g. Potts model) non-convex optimization problem (π(u) := 1 x∈Ω u(x) x∈Ω u(x)δI(x) ) 5 of 30

Slide 14

Slide 14 text

Previous works A large number of methods have been proposed in the literature to address this problem Focus on approaches considering metrics S based on Optimal Transport: [Peyré, Fadili and Rabin ‘12] Unsupervised non convex active contour based on optimal transport metric [Yildizoglu, Aujol and P. ’13] Supervised convex fast based on Wasserstein- 1 metric for grayscale images [Swoboda & Schnörr ‘13] Unsupervised/supervised convex multi-phase model based on optimal transport metric 6 of 30

Slide 15

Slide 15 text

Building histograms of features Example with two color histograms: H(u) and H(1 − u) I u foreground H(u) background H(1 − u) 7 of 30

Slide 16

Slide 16 text

Histograms construction N colored pixels: ⇒ Deﬁne a color space C composed by M color bins The color I(x) at pixel x belongs to a single color bin Ci Matrix H of size M × N has only N non null values: H =        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H is a linear assignment operator: H(i, x) = 1 ⇒ I(x) ∈ Ci 8 of 30

Slide 17

Slide 17 text

Histogram-based data term Use H to build the histogram of I on a speciﬁc area: Full image I × 1( Corresponding histogram        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H ×        1 1 . . . 1 1        1( =        n1 n2 . . . nM−1 nM        h1( = ni is the number of pixels activated by 1 and belonging to Ci 9 of 30

Slide 18

Slide 18 text

Histogram-based data term Use H to build the histogram of I on a speciﬁc area: Object I × u( Corresponding histogram        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H ×        0 1 . . . 1 0        u( =        n1 n2 . . . nM−1 nM        hu ( = ni is the number of pixels activated by u and belonging to Ci 9 of 30

Slide 19

Slide 19 text

Histogram-based data term Use H to build the histogram of I on a speciﬁc area: Background I × (1 − u)( Corresponding histogram        1 0 0 1 · · · 0 0 1 0 0 · · · 1 . . . 0 0 0 0 · · · 0 0 0 1 0 · · · 0        H ×        1 0 . . . 0 1        1−u( =        n1 n2 . . . nM−1 nM        h1−u ( = ni is the number of pixels activated by 1 − u and belonging to Ci 9 of 30

Slide 20

Slide 20 text

Convex Variational Model Non Convex model argmin u∈[0,1]N S(π(u), π(u1 )) + S(π(1 − u), π(u0 )) + R(u) We have π(u) = 1 x∈Ω u(x) x∈Ω u(x)δI(x) = Hu u, 1 , 10 of 30

Slide 21

Slide 21 text

Convex Variational Model Non Convex model argmin u∈[0,1]N S(π(u), π(u1 )) + S(π(1 − u), π(u0 )) + R(u) We have π(u) = 1 x∈Ω u(x) x∈Ω u(x)δI(x) = Hu u, 1 , Convex model [Yildizoglu et al. ’13, Swoboda & Schnörr ‘13] argmin u∈[0,1]N S(Hu, u, 1 π(u1 )) + S(H(1 − u), 1 − u, 1 π(u0 )) + R(u) 10 of 30

Slide 22

Slide 22 text

Convex Variational Model Convex model argmin u∈∆ S(Hu, u, 1 π(u1 )) + S(H(1 − u), 1 − u, 1 π(u0 )) + ρ TV (u) ∆ := [0, 1]N Hu and H(1 − u) are unnormalized histograms π(u0 ) and π(u1 ) are quantized reference distributions u, 1 = x∈Ω u(x) is a normalization for reference histogram π(u1 ) TV (u) = ||Du||1,2 is the discrete, isotropic total variation, “Chan and Vese”-like model with color histograms instead of mean colors 11 of 30

Slide 23

Slide 23 text

Regularity using Total Variation Discrete Formulation of total variation is 1,2 norm of discrete gradient TV (u) = ||Du||1,2 = x∈Ω ||Du(x)||2 where Du(i, j) = u(i, j) − u(i − 1, j) u(i, j) − u(i, j − 1) Property Related to the length of the perimeters of level-sets Example with increasing TV parameter values: TV : < < < 12 of 30

Slide 24

Slide 24 text

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Histograms comparison with metric S [Rother et al. ‘06, Vicente et al. ’10,Yildizoglu et al. ’13]: S is a bin-to-bin distances ( 2 , 1 , Bhattacharyya, Mahalanobis, Kullback-Leibler...) Fast and easy to optimize (graph methods can be considered) Works with unnormalized histograms Not robust to perturbations (quantization, shifts, etc) With bin-to-bin distances S: S(A, B) = S(C, B) [Peyré et al. ’12, Swoboda & Schnörr ‘13] Optimal transport cost Measure the cost for transporting one histogram to another Robust to large perturbations Much more diﬃcult to optimize (problem of gigantic size with graphs) Does not work with unnormalized histograms 13 of 30

Slide 27

Slide 27 text

Optimal transport Transport cost between normalized histograms µ and ν µ = M i=1 mi δXi , ν = P j=1 nj δYj , mi , nj ≥ 0 are the masses at locations Xi , Yj , i mi = j nj = 1 Monge-Kantorovitch optimal transport distance MK(µ, ν) = min P∈Pµ,ν    P , C = i,j Pi,j Xi − Yj 2    = min L·p=b p≥0 pT c • C is the ﬁxed cost assignment matrix between histograms bins: Ci,j = ||Xi − Yj ||2 (quadratic Wasserstein distance) • Pµ,ν is the set of non negative matrices P with marginals µ and ν, i.e. Pµ,ν =    P ∈ RM×N + , i,j Pi,j = 1, j Pi,j = mi , i Pi,j = nj    • Pi,j is the proportion of mass mi at Xi that is send to Yj 14 of 30

Slide 28

Slide 28 text

Optimal transport solution illustration in 1D Histograms µ and ν (on uniform grid) Optimal ﬂow P 15 of 30

Slide 29

Slide 29 text

Optimal transport solution illustration in 1D Histograms µ and ν (on uniform grid) Optimal ﬂow P 15 of 30

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Optimization problem Primal formulation u ∈ argmin u∈∆ MK Hu, H1 u + MK H(1 − u), H0 (1 − u) + ρ||Du||1,2 where H0, H1 are linear operators (matrices N × M of rank 1) s.t. H1 u = π(u1 )1T u = u, 1 π(u1 ) Generic formulation and resolution with proximal splitting algorithms such as GFBS [Raguet et al. ‘11] as done in [Swoboda and Schn´ ’or ‘13] argmin x i Gi (Ai x) Simple iterative algorithm using ProxGi ◦Ai Requires to solve at each iteration a quadratic programming problem Prox MK( ˜ H·) (x) = argmin y∈∆ ||x − y||2 + MK(˜ Hy) 16 of 30

Slide 32

Slide 32 text

Optimization problem Primal formulation u ∈ argmin u∈∆ MK Hu, H1 u + MK H(1 − u), H0 (1 − u) + ρ||Du||1,2 where H0, H1 are linear operators (matrices N × M of rank 1) s.t. H1 u = π(u1 )1T u = u, 1 π(u1 ) Generic formulation and resolution with proximal splitting algorithms such as Primal-Dual [Chambolle and Pock‘11] using Fenchel transform argmin x i Gi (Ai x) = argmin x argmax zi i Ai x, zi − G∗ i (zi ) Simple iterative algorithm using ProxG∗ i Requires to solve at each iteration a quadratic programming problem ProxMK∗(·) (x) = argmin y∈∆ ||x − y||2 + MK∗(y) 16 of 30

Slide 33

Slide 33 text

Sinkhorn distance Entropic regularization of optimal transport [Cuturi et al. ‘13] MKλ (µ, ν) := min P∈Pµ,ν P, C + 1 λ P, log P = 1 λ KL(P|e−λC ) where P, log P is the negative entropy ⇒ strongly convex problem Properties Regularization of the optimal transport, MKλ → MK with λ → ∞ Simple and fast algorithm to solve MKλ [Sinkhorn ‘60s] Simple dual formulation [Cuturi et al. ’15] Less sparse solutions [Solomon et al. ’15] [P. & Rabin’15] 17 of 30

Slide 34

Slide 34 text

Slide 35

Slide 35 text

Adaptation to convex segmentation problem Normalization Due to relaxation, histograms hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Dual formulation Legendre-Fenchel transform of a convex function F deﬁned on X is F∗(y) = sup x∈X x, y − F(x) and we have F∗∗ = F 18 of 30

Slide 36

Slide 36 text

Adaptation to convex segmentation problem Normalization Due to relaxation, histograms hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Dual formulation [Rabin & P. ‘15], for γ = (α, β) MK∗ λ,m (γ) = m λ    Qλ (γ), 1 if Qλ (γ), 1 1 log Qλ (γ), 1 + 1 if Qλ (γ), 1 1 Qλ (γ)i,j := eλ(αi +βj −Ci,j )−1 Remark: We recover the special case m = 1 of [Cuturi et al. ’15] 18 of 30

Slide 37

Slide 37 text

Adaptation to convex segmentation problem Normalization Due to relaxation, histograms hk are not normalized MKλ,m (h1, h2) := min P∈Ph1,h2 P, C + 1 λ log P m + ιHm (h1, h2). Hm := (h1, h2) ∈ RM + × RM + , h1, 1M = h2, 1M ≤ m . Dual formulation [Rabin & P. ‘15], for γ = (α, β) ∇MK∗ λ,m (γ) =    m Q1, Q 1 if Q, 1 1 m Q,1 Q1, Q 1 if Q, 1 1 Qi,j = eλ(αi +βj −Ci,j )−1 which is Lipschitz continuous, with constant upperbounded by 2λm (worst case scenario). 18 of 30

Slide 38

Slide 38 text

Proposed segmentation optimization problem Primal formulation u ∈ argmin u∈∆ MKλ,N (Hu, H1 u) + MKλ,N (H(1 − u), H0 (1 − u)) + ρ||Du||1,2 19 of 30

Slide 39

Slide 39 text

Proposed segmentation optimization problem Primal formulation u ∈ argmin u∈∆ MKλ,N (Hu, H1 u) + MKλ,N (H(1 − u), H0 (1 − u)) + ρ||Du||1,2 Equivalent Primal-Dual formulation after splitting for the TV term argmin u max v Ku, v + ι∆ (u) − F∗(v) − G∗(v) where • v = (γ0, γ1, w) ∈ R4M+2N is the dual variable; • K = [D , −H , H0 , H , −H1 ] ∈ RN×(2N+4M); • ι∆ is the indicator of ∆ = [0, 1]N ; • G∗(v) = ι| |.| |∞,2 ≤ρ (w) (dual formulation of 1,2 norm) is simple • F∗(v) = MK∗ λ,N (γ1) + MK∗ λ,N (γ0) − H1, α0 + H0 1, β0 is diﬀerentiable • ιHN is absorbed in the dual formulation of MKλ,N ! 19 of 30

Slide 40

Slide 40 text

Algorithm Use of the inertial preconditioned primal-dual algorithm of [Lorentz & Pock ‘15] (here shown with only one parameter r > 0, other being set to 1)        u(t+1) = Proj∆ u(t) − TK v(t) ˜ u(t+1) = 2u(t+1) − u(t) v(t+1) = ProxG∗ v(t) + Σ(K˜ u(t) − ∇F∗(v(t))) where T = τ IdN and Σ = diag (σw , σh ) are two diagonal pre-conditionning matrices 1 τ = 8r, 1 σv = 2 r · 12N , et 1 σh = 2λN · 14M + 1 r     H H0 H H1     1N 20 of 30

Slide 41

Slide 41 text

Slide 42

Slide 42 text

Limitations and Alternatives Advantages & limitations Regularized measure MKλ is robust to outliers Exploits the smoothness of the regularization, No need for optimal transport computation Memory storage: primal variable (size N), dual variables (size M). Not interesting when λ → ∞ as the Lipschitz constant is ≤ 2λN In this case, we can rely to bi-conjugation The proximal operators can be computed simply (either for MK or MKλ ) The dual variables are of size M2: limited to coarse color clustering 21 of 30

Slide 43

Slide 43 text

Experiments: comparison of various metrics Accurate description of the background prior I, u1 , u0 1 Wasserstein Regularized Wasserstein Results with MK distance (i.e. 1 = 0) 22 of 30

Slide 44

Slide 44 text

Experiments: comparison of various metrics Same parameters: other background prior I, u1 , u0 1 Wasserstein Regularized Wasserstein Results with MK distance (i.e. 1 = 0) 22 of 30

Slide 45

Slide 45 text

Other Wasserstein segmentation methods I, u1 , u0 Local [Ni et al. ‘09] Global Wasserstein active contours [Peyré et al. ‘12]: Initialization Result 23 of 30

Slide 46

Slide 46 text

Inﬂuence of the regularization MKλ (h1, h2) = min P∈P(h1,h2) C, P + 1 λ P, log P/N I, u1 , u0 Wasserstein Regularized Wasserstein λ = 100 24 of 30

Slide 47

Slide 47 text

Inﬂuence of the regularization MKλ (h1, h2) = min P∈P(h1,h2) C, P + 1 λ P, log P/N I, u1 , u0 Wasserstein Regularized Wasserstein λ = 10 24 of 30

Slide 48

Slide 48 text

Inﬂuence of the ﬁnal binarization I, u1 , u0 u∗ u∗ > 0.9 25 of 30

Slide 49

Slide 49 text

Inﬂuence of the ﬁnal binarization I, u1 , u0 u∗ u∗ > 0.5 25 of 30

Slide 50

Slide 50 text

Inﬂuence of the ﬁnal binarization I, u1 , u0 u∗ u∗ > 0.2 25 of 30

Slide 51

Slide 51 text

Inﬂuence of the ﬁnal binarization I, u1 , u0 u∗ u∗ > 0.1 25 of 30

Slide 52

Slide 52 text

Experiments: Partial labeled image Use background prior for similar images I, u1 , u0 26 of 30

Slide 53

Slide 53 text

Experiments: Texture segmentation Input Result Texture segmentation using histograms of gradient norms. 27 of 30

Slide 54

Slide 54 text

Extension to Co-segmentation Find the same object in two diﬀerent images (no prior on the object): J(u) := MKλ H1u1, H2u2 + ρ 2 k=1 ||Duk ||1,2 − α||uk ||1 Balloon force −δ||uk ||1 to prevent from estimating empty areas u1 u2 Problem: Deal with change of object scale in diﬀerent images 28 of 30

Slide 55

Slide 55 text

Conclusion and future work Contributions Regularized optimal transport based metric for image segmentation Fast and simple iterative algorithm enabling unnormalized histograms Future work Natural extension to multiple phase segmentation (see [Swoboda & Schnörr ‘13]) Cosegmentation Unbalanced histograms using mass conservation relaxation ([Ferradans et al. ‘13, Benamou et al. ‘15]) 29 of 30

Slide 56

Slide 56 text

Question time Thank you for your attention ! 30 of 30