Methodology Proposed Spatially & Temporally Dynamic Graph Cut Proposed Error Correction Experiments on Interactive Image Segmentation Interactive Video Segmentation Review of the Literature Proposed Filtering Based Formulation Proposed Linear Dynamic Graph-Cut Proposed Automatic Video Object Segmentation Extension Experiments on Interactive Image Segmentation Experiments on Automatic Video Segmentation Interactive Mobile Segmentation 3/41
Optimization Scribbles Approximate Boundary Bounding Box t s i j w is w it w js w jt w ij w ji Min Cut / Max Flow Dynamic Programming Boundary Path Cost Gaussian Mixture Model Kernel Density Estimation Interactive Mobile Segmentation 4/41
Jolly 2001] Grabcut [Rother et al. 2004] Geodesic Image Matting [Bai, Sapiro 2008] Lazy Snapping [Lin et al. 2004] Intelligent Scissors [Mornsten, Barett 95] Model Formulation Optimization Approximate Boundary User Interaction Scribbles Bounding Box Dynamic Programming Boundary Path Cost Gaussian Mixture Model Kernel Density Estimation t s i j w is w it w js w jt w ij w ji Min Cut / Max Flow Interactive Mobile Segmentation 5/41
the graph algorithms depends on number of nodes and edges. Most intuitive approach to increase computational eﬃciency is using over-segmentation All algorithms are developed on generic graphs and all experiments are conducted on over-segment graph obtained by SLIC algorithm [Achanta et al., 2010] Proposed Method 8/41
5 x 6 x 7 x 8 . . . . . . . . . . . . . . . z 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 z 1 . . . θ z 0 x 0 x 3 z 3 Image/Video is represented as a graph of pixels/regions. User interaction is formulated as a parametric model. Resultant dependency network is Markov Random Field (MRF). xi: Labels of each pixel/region zi: Color of each pixel/region θ: Parametric model of FG/BG (GMM learned by interaction) Graph Theoretical Segmentation 9/41
theorem [Cliﬀord, 1990]: p(x|z, θ) = 1 Z exp(−E(x, z, θ)) MAP (Maximum a Posterior) solution is: arg min x E(x, z, θ) arg min x . . ∑ vi∈V Ei(xi, zi) + . . ∑ eij∈E Eij(xi, xj) . . GMM Likelihood . . Smoothness Penalty Equivalent to a Min-Cut on 2-terminal graph s x 2 x 6 x 7 x 8 x 3 t x 4 x 5 x 0 x 1 s-t cut E1(x1)+E2(x2)+E12(x1, x2) E1(1) E1(0) E2(1) E2(0) E12(1,0)=E12(0,1) v 2 v 1 s t . Graph Theoretical Segmentation 10/41
maximum ﬂow from s to t [Ford, Fulkerson 1962]. Pushing any ﬂow from s to t does not change the solution. Maximum ﬂow is found via augmenting paths [Ford, Fulkerson 1962]. Find and push a valid ﬂow from s to t Update the graph: re = we − f(e) ∀e ∈ E Until there exist no ﬂow re : Residual weight of the edge e we : Weight of the edge e f(e) : Flow push through the edge e E1(1) E1(0) E2(1) E2(0) E12(0,1) v 2 v 1 s t E12(1,0) αfow βfow Graph Theoretical Segmentation 11/41
of the algorithm; Graph structure and binary edge weights are not changing, Unary edge weights changing slightly. Previous ﬂows can be re-used with an update: rt ei = rt−1 ei + wt ei − wt−1 ei Resultant residual graph will be sparse: Augmenting path algorithm will converge in less iteration Proposed Dynamic Graph-Cut 12/41
property of locality Can we extend the dynamic graph-cut idea to spatial dimensions ? Is it possible to ﬁnd a sub-graph around the interaction which gives approximately same result with global solution ? Proposed Dynamic Graph-Cut 13/41
R, This solution can be extended to a global one; Label of the nodes in R can only be ﬂipped via ﬂows coming from outside of R. Following condition is suﬃcient for robustness (proof is omitted) If R is foreground (connected to source) ∑ i∈R wiS − wiT > ∑ i∈R,j∈N ∃P ath(i,j),e∈E∩P ath(i,j) min(we) If R is background (connected to sink) ∑ i∈R wiT − wiS > ∑ i∈R,j∈N ∃P ath(j,i),e∈E∩P ath(j,i) min(we) Proposed Dynamic Graph-Cut 14/41
the robustness of the clusters obtained via GMM Instead of cluster boundaries, use boundary of the rectangle R Weaker condition is: If R is foreground (connected to source) ∑ i∈R wiS − wiT > ∑ iR,j / ∈R wij If R is background (connected to sink) ∑ i∈R wiT − wiS > ∑ j / ∈R,i∈R wji Proposed algorithm starts with the bounding box of the user interaction and enlarges the solution until proposed condition is satisﬁed. Proposed Dynamic Graph-Cut 15/41
is bounding box of the current interaction, Red rectangle is the computed bounding box. b: Result of graph-cut for blue rectangle c: Result of graph-cut for red rectangle. Proposed Dynamic Graph-Cut 16/41
for the color proﬁle of the interaction Start to discard interactions which is not consistent with color model until user comes back to the initial region Proposed Dynamic Graph-Cut 18/41
for the color proﬁle of the interaction Start to discard interactions which is not consistent with color model until user comes back to the initial region or move to the another color proﬁle. Proposed Dynamic Graph-Cut 18/41
for the color proﬁle of the interaction Start to discard interactions which is not consistent with color model until user comes back to the initial region or move to the another color proﬁle. Replace the discarded interaction with the path minimizing Cost(path) = ∑ u,v∈path |xu − xv| + λ|Iu − Iv| Proposed Dynamic Graph-Cut 18/41
True Positive Multi Color False Positive Notes: False Positives are handled via path ﬁnding. False Negatives requires a restart. Proposed Dynamic Graph-Cut 19/41
200 400 600 800 1000 10 20 30 40 50 60 Execution Time (msec) Iteration (User Interaction) Boykov&Kolmogrov [4] Kohli&Torr [10] Proposed Method Interaction throughout the entire process is divided into set of interactions on 3 superpixels and fed to all algorithms. Experimental Results 22/41
Methodology Proposed Spatially & Temporally Dynamic Graph Cut Proposed Error Correction Experiments on Interactive Image Segmentation Interactive Video Segmentation Review of the Literature Proposed Filtering Based Formulation Proposed Linear Dynamic Graph-Cut Proposed Automatic Video Object Segmentation Extension Experiments on Interactive Image Segmentation Experiments on Automatic Video Segmentation Experimental Results 23/41
ers Color and Shape Models via Motion Information Feature Matching Interaction Solve with Graph Clustering Linear Matting Local Search Interaction t s i j w is w it w js w jt w ij w ji Min Cut / Max Flow Min-Cut/Max-Flow Rotobrush [Bai et al., 2009] [Zhang et al., 2008] [Grundman et al., 2010] Geodesic Video [Bai et al., 2007] Interactive Video Segmentation 24/41
initial frame is obtained via interaction; E(α) = ∑ vi∈V U(αi , zi ) + ∑ vi∈V ∑ vj ∈N(vi) V (zi , zj )ϕ[αi ̸= αj ] Markovian property implies that we can estimate MRF energy of the current frame via MRF energy of the previous frame. Given a spatio-temporal distance function, linear estimation is possible via; Ut(αt i , zt i ) = 1 γt i ∑ vt−1 j ∈Vt−1 Ut−1(αt−1 j , zt−1 j )e−dis(zt i ,zt−1 j ) V t(zt i , zt j ) = 1 γt ij ∑ vk∈Vt−1 ∑ vl∈N e−dis(zt i ,zt−1 k )e−dis(zt j ,zt−1 l )V t−1(zt−1 k , zt−1 l ) Interactive Video Segmentation 25/41
the best choice. Computational complexity of geodesic distance ﬁlter -O(n3)- is not aﬀordable in mobile scenarios. Framet-1 Framet Temporal Horizontal Vertical Information Permeability/Bi-exponential (IP/BE) [Cigla, Alatan, 2010]/[Thvenaz et al., 2012] Filter is an approximate yet eﬃcient -O(n)- alternative to geodesic distance ﬁlter. Interactive Video Segmentation 26/41
independently. There is a signiﬁcant redundancy; however, graph structure is changing due to the over-segmentation. Either solves a computationally expensive graph matching (best known algorithm is O(n2logn)) or exploit linearity. . Proposition . . . Binary labels obtained by minimizing the MRF energy, resulted after applying bilateral ﬁlter on the energy function which is deﬁned via residual graph, is equivalent to minimizing the MRF energy obtained via applying bilateral ﬁlter on the original energy function. Interactive Video Segmentation 29/41
is w it w js w jt w ij w ji Graph t t s i j r is r it r js r jt r ij r ji s t Linear Transformation (Bilateral Filter) a b c w as w cs w at w bt w ct wab w ca s t a b c r as r cs r at r bt r ct rab r ca Min-Cut Max-Flow s t a b c Min-Cut Max-Flow Min-Cut Max-Flow s t a b c = t ia t ja t ib t jb t ic t jc w as w bs w cs w is w js Graph t+1 Solution t+1 Residual Graph t Residual Graph t+1 Residual Solution t+1 = t ia t ja t ib t jb t ic t jc r as r bs r cs r is r js Linear Transformation (Bilateral Filter) Interactive Video Segmentation 30/41
object segmentation tools using computational costly features like saliency, optical ﬂow and shape. Proposed interactive video segmentation tool is eﬃcient; however, requires an interaction in ﬁrst frame. Any MRF Energy based automatic video segmentation tool can be used to initialize the proposed method. Proposed MRF Energy estimation method is experimented as a speed-up tool for Keysegments [Lee et al., 2011] algorithm. Automatic Video Segmentation 37/41
same results with global solution. Spatial information and user interaction is too valuable to discard even in erroneous case. Dynamic formulation of user interaction increase user satisfaction and makes eﬃcient graph optimization possible. Interactive video segmentation problem is actually an estimation problem. Given a reliable spatio-temporal distance, it is possible to compensate lack of motion information. Solution to min-cut/max-ﬂow problem is linear and can easily be combined by other linear formulations. Conclusion & Future Work 39/41