Master Thesis Presentation at METU

An Eﬃcient Graph-Theoretical Approach for Interactive Mobile Image and Video
Segmentation Ozan S ¸ener Electrical and Electronics Engineering Department Middle East Technical University May 14, 2015

Interactive Segmentation Problem User Interaction Segmentation Mask Application Segmentation Mask
(Rest of the Video) Application Interactive Mobile Segmentation 1/41

Issues Related to Mobile Touch-Screen Devices Photo courtesy of Adobe
Systems Inc. Rich Interaction Possibilities More Interaction Errors Low Computational Power Interactive Mobile Segmentation 2/41

Outline Interactive Image Segmentation Review of the Literature Proposed Interaction
Methodology Proposed Spatially & Temporally Dynamic Graph Cut Proposed Error Correction Experiments on Interactive Image Segmentation Interactive Video Segmentation Review of the Literature Proposed Filtering Based Formulation Proposed Linear Dynamic Graph-Cut Proposed Automatic Video Object Segmentation Extension Experiments on Interactive Image Segmentation Experiments on Automatic Video Segmentation Interactive Mobile Segmentation 3/41

Building Blocks of Interactive Image Segmentation User Interaction Model Formulation
Optimization Scribbles Approximate Boundary Bounding Box t s i j w is w it w js w jt w ij w ji Min Cut / Max Flow Dynamic Programming Boundary Path Cost Gaussian Mixture Model Kernel Density Estimation Interactive Mobile Segmentation 4/41

Related Work for Interactive Image Segmentation N-D Image Segmentation [Boykov,
Jolly 2001] Grabcut [Rother et al. 2004] Geodesic Image Matting [Bai, Sapiro 2008] Lazy Snapping [Lin et al. 2004] Intelligent Scissors [Mornsten, Barett 95] Model Formulation Optimization Approximate Boundary User Interaction Scribbles Bounding Box Dynamic Programming Boundary Path Cost Gaussian Mixture Model Kernel Density Estimation t s i j w is w it w js w jt w ij w ji Min Cut / Max Flow Interactive Mobile Segmentation 5/41

Proposed Interaction Method - Coloring UserInteraction Scribbles ApproximateBoundary BoundingBox Model
Formulation and Optimization depends on user interaction Proposed Method 6/41

Proposed Interaction Method - Coloring Gesture of Coloring a Color
Book Proposed Method 7/41

Pixel Grid to Over-segment Graph Complexity of the most of
the graph algorithms depends on number of nodes and edges. Most intuitive approach to increase computational eﬃciency is using over-segmentation All algorithms are developed on generic graphs and all experiments are conducted on over-segment graph obtained by SLIC algorithm [Achanta et al., 2010] Proposed Method 8/41

Graphical Model for the Segmentation x 2 x 4 x
5 x 6 x 7 x 8 . . . . . . . . . . . . . . . z 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 z 1 . . . θ z 0 x 0 x 3 z 3 Image/Video is represented as a graph of pixels/regions. User interaction is formulated as a parametric model. Resultant dependency network is Markov Random Field (MRF). xi: Labels of each pixel/region zi: Color of each pixel/region θ: Parametric model of FG/BG (GMM learned by interaction) Graph Theoretical Segmentation 9/41

Transformation to Energy-Minimization Factorization is not possible. Hammersley - Cliﬀord
theorem [Cliﬀord, 1990]: p(x|z, θ) = 1 Z exp(−E(x, z, θ)) MAP (Maximum a Posterior) solution is: arg min x E(x, z, θ) arg min x . . ∑ vi∈V Ei(xi, zi) + . . ∑ eij∈E Eij(xi, xj) . . GMM Likelihood . . Smoothness Penalty Equivalent to a Min-Cut on 2-terminal graph s x 2 x 6 x 7 x 8 x 3 t x 4 x 5 x 0 x 1 s-t cut E1(x1)+E2(x2)+E12(x1, x2) E1(1) E1(0) E2(1) E2(0) E12(1,0)=E12(0,1) v 2 v 1 s t . Graph Theoretical Segmentation 10/41

Finding Minimum Cut on s-t Graph Dual problem is finding
maximum flow from s to t [Ford, Fulkerson 1962]. Pushing any flow from s to t does not change the solution. Maximum flow is found via augmenting paths [Ford, Fulkerson 1962]. Find and push a valid flow from s to t Update the graph: re = we − f(e) ∀e ∈ E Until there exist no flow re : Residual weight of the edge e we : Weight of the edge e f(e) : Flow push through the edge e E1(1) E1(0) E2(1) E2(0) E12(0,1) v 2 v 1 s t E12(1,0) αfow βfow Graph Theoretical Segmentation 11/41

Temporally Dynamic Graph-Cut [Kohli et al. 2005] Consider every iteration
of the algorithm; Graph structure and binary edge weights are not changing, Unary edge weights changing slightly. Previous ﬂows can be re-used with an update: rt ei = rt−1 ei + wt ei − wt−1 ei Resultant residual graph will be sparse: Augmenting path algorithm will converge in less iteration Proposed Dynamic Graph-Cut 12/41

Proposed Spatially & Temporally Dynamic Graph-Cut Proposed interaction has the
property of locality Can we extend the dynamic graph-cut idea to spatial dimensions ? Is it possible to ﬁnd a sub-graph around the interaction which gives approximately same result with global solution ? Proposed Dynamic Graph-Cut 13/41

Local Robustness Rule Consider the max-flow computed for a region
R, This solution can be extended to a global one; Label of the nodes in R can only be flipped via flows coming from outside of R. Following condition is sufficient for robustness (proof is omitted) If R is foreground (connected to source) ∑ i∈R wiS − wiT > ∑ i∈R,j∈N ∃P ath(i,j),e∈E∩P ath(i,j) min(we) If R is background (connected to sink) ∑ i∈R wiT − wiS > ∑ i∈R,j∈N ∃P ath(j,i),e∈E∩P ath(j,i) min(we) Proposed Dynamic Graph-Cut 14/41

Local Robustness Rule - Weaker Rule Instead of nodes, consider
the robustness of the clusters obtained via GMM Instead of cluster boundaries, use boundary of the rectangle R Weaker condition is: If R is foreground (connected to source) ∑ i∈R wiS − wiT > ∑ iR,j / ∈R wij If R is background (connected to sink) ∑ i∈R wiT − wiS > ∑ j / ∈R,i∈R wji Proposed algorithm starts with the bounding box of the user interaction and enlarges the solution until proposed condition is satisﬁed. Proposed Dynamic Graph-Cut 15/41

Spatially & Temporally Dynamic Graph-Cut in Action a: Blue rectangle
is bounding box of the current interaction, Red rectangle is the computed bounding box. b: Result of graph-cut for blue rectangle c: Result of graph-cut for red rectangle. Proposed Dynamic Graph-Cut 16/41

Error Tolerance Options Solve interaction errors before optimization vs within
optimization Proposed Dynamic Graph-Cut 17/41

Interaction Error Correction Algorithm Keep a single RGB Gaussian model
for the color proﬁle of the interaction Proposed Dynamic Graph-Cut 18/41

for the color proﬁle of the interaction Start to discard interactions which is not consistent with color model until user comes back to the initial region Proposed Dynamic Graph-Cut 18/41

for the color proﬁle of the interaction Start to discard interactions which is not consistent with color model until user comes back to the initial region or move to the another color proﬁle. Proposed Dynamic Graph-Cut 18/41

for the color proﬁle of the interaction Start to discard interactions which is not consistent with color model until user comes back to the initial region or move to the another color proﬁle. Replace the discarded interaction with the path minimizing Cost(path) = ∑ u,v∈path |xu − xv| + λ|Iu − Iv| Proposed Dynamic Graph-Cut 18/41

Error Correction in Action Single Color True Positive Multi Color
True Positive Multi Color False Positive Notes: False Positives are handled via path ﬁnding. False Negatives requires a restart. Proposed Dynamic Graph-Cut 19/41

Subjective Evaluation of Interaction Quality 15 Subjects (Undergraduate Level Engineering
Students) 4 Random images out of 10 images Grading in the level of 1-5 for 4 diﬀerent metrics Results in the format of Median(STD) P-Values (via dependent ANOVA test): 0.0005 Perf. Easiness Entertain. Overall Proposed Method 5 (0.45) 4 (0.86) 5 (0.74) 4 (0.45) GrabCut 3 (0.92) 4 (0.75) 2 (0.61) 3 (0.77) t[Rotheretal., 2004] Intelligent Scissor. 3 (0.51) 2 (0.74) 3 (0.89) 2 (0.76) [Mortensen, 1995] Experimental Results 20/41

Experiments on Error Correction. Interaction No Error Correction Soft Label
Graph-Cut Proposed Method Experimental Results 21/41

Computation Time Improvement via Spatially & Temporally Dynamic Graph-Cut 0
200 400 600 800 1000 10 20 30 40 50 60 Execution Time (msec) Iteration (User Interaction) Boykov&Kolmogrov [4] Kohli&Torr [10] Proposed Method Interaction throughout the entire process is divided into set of interactions on 3 superpixels and fed to all algorithms. Experimental Results 22/41

Outline Interactive Image Segmentation Review of the Literature Proposed Interaction
Methodology Proposed Spatially & Temporally Dynamic Graph Cut Proposed Error Correction Experiments on Interactive Image Segmentation Interactive Video Segmentation Review of the Literature Proposed Filtering Based Formulation Proposed Linear Dynamic Graph-Cut Proposed Automatic Video Object Segmentation Extension Experiments on Interactive Image Segmentation Experiments on Automatic Video Segmentation Experimental Results 23/41

Review of the Interactive Video Segmentation Literature Propagate Local Classi
ers Color and Shape Models via Motion Information Feature Matching Interaction Solve with Graph Clustering Linear Matting Local Search Interaction t s i j w is w it w js w jt w ij w ji Min Cut / Max Flow Min-Cut/Max-Flow Rotobrush [Bai et al., 2009] [Zhang et al., 2008] [Grundman et al., 2010] Geodesic Video [Bai et al., 2007] Interactive Video Segmentation 24/41

Re-deﬁnition of the Video Segmentation Problem MRF Energy of the
initial frame is obtained via interaction; E(α) = ∑ vi∈V U(αi , zi ) + ∑ vi∈V ∑ vj ∈N(vi) V (zi , zj )ϕ[αi ̸= αj ] Markovian property implies that we can estimate MRF energy of the current frame via MRF energy of the previous frame. Given a spatio-temporal distance function, linear estimation is possible via; Ut(αt i , zt i ) = 1 γt i ∑ vt−1 j ∈Vt−1 Ut−1(αt−1 j , zt−1 j )e−dis(zt i ,zt−1 j ) V t(zt i , zt j ) = 1 γt ij ∑ vk∈Vt−1 ∑ vl∈N e−dis(zt i ,zt−1 k )e−dis(zt j ,zt−1 l )V t−1(zt−1 k , zt−1 l ) Interactive Video Segmentation 25/41

Selection of Spatio - Temporal Distance Ideally, spatio-temporal geodesic is
the best choice. Computational complexity of geodesic distance filter -O(n3)- is not affordable in mobile scenarios. Framet-1 Framet Temporal Horizontal Vertical Information Permeability/Bi-exponential (IP/BE) [Cigla, Alatan, 2010]/[Thvenaz et al., 2012] Filter is an approximate yet efficient -O(n)- alternative to geodesic distance filter. Interactive Video Segmentation 26/41

Information Permeability/Bi-exponential (IP/BE) Filter Distance computation and ﬁltering can be
obtained simultaneously in linear time via independent 1-tap recursive ﬁlters in all dimensions (x,y and t). ˆ x1 [k] = x1 [k] + ˆ x1 [k − 1]r(x[k], x[k − 1]) and ˆ x2 [k] = x2 [k] + ˆ x2 [k + 1]r(x[k], x[k + 1]) with normalization y[k] = ˆ x1 [k] + ˆ x2 [k] ˆ 11 [k] + ˆ 12 [k] Interactive Video Segmentation 27/41

Sample MRF Energy Propagation 100 200 300 400 500 600
50 100 150 200 250 300 350 400 450 −4 −2 0 2 4 6 8 U1(α1 i , z1 i ) V 1(z1 i , z1 j ) 100 200 300 400 500 600 50 100 150 200 250 300 350 400 450 −4 −3 −2 −1 0 1 2 3 4 5 6 ˆ U5(α5 i , z5 i ) ˆ V 5(z5 i , z5 j ) Interactive Video Segmentation 28/41

Dynamic Graph-Cut MRF Energy of the every frame is solved
independently. There is a significant redundancy; however, graph structure is changing due to the over-segmentation. Either solves a computationally expensive graph matching (best known algorithm is O(n2logn)) or exploit linearity. . Proposition . . . Binary labels obtained by minimizing the MRF energy, resulted after applying bilateral filter on the energy function which is defined via residual graph, is equivalent to minimizing the MRF energy obtained via applying bilateral filter on the original energy function. Interactive Video Segmentation 29/41

Dynamic Graph-Cut for Linear Filtering t s i j w
is w it w js w jt w ij w ji Graph t t s i j r is r it r js r jt r ij r ji s t Linear Transformation (Bilateral Filter) a b c w as w cs w at w bt w ct wab w ca s t a b c r as r cs r at r bt r ct rab r ca Min-Cut Max-Flow s t a b c Min-Cut Max-Flow Min-Cut Max-Flow s t a b c = t ia t ja t ib t jb t ic t jc w as w bs w cs w is w js Graph t+1 Solution t+1 Residual Graph t Residual Graph t+1 Residual Solution t+1 = t ia t ja t ib t jb t ic t jc r as r bs r cs r is r js Linear Transformation (Bilateral Filter) Interactive Video Segmentation 30/41

Sample Segmentation Result [ ]pdfmark= /F (res2.avi) /Poster true ¿¿,Annotations=¡¡
Experimental Results 31/41

Comparison of Segmentation Quality [ ]pdfmark= /F (resultIce.avi) /Poster true
¿¿,Annotations=¡¡ Experimental Results 32/41

Computation Time Improvement via Dynamic Graph-Cut 0 10 20 30
40 50 60 70 0 10 20 30 40 50 60 70 80 90 100 Computation Time (msec) Frame Number Min-Cut/Max-Flow [8] Proposed Method Experimental Results 33/41

Precision-recall curves for SegTrack[Tsai et al., 2010] Dataset. 0 0.2
0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Geodesic [21] Roto Brush [17] Proposed Method Birdfall 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Precision Recall Geodesic [21] Roto Brush [17] Proposed Method Cheetah 0 0.2 0.4 0.6 0.8 1 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Geodesic [21] Roto Brush [17] Proposed Method Girl 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Geodesic [21] Roto Brush [17] Proposed Method Monkey 0 0.2 0.4 0.6 0.8 1 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Precision Recall Geodesic [21] Roto Brush [17] Proposed Method Penguin 0 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Geodesic [21] Roto Brush [17] Proposed Method Parachute Experimental Results 34/41

Failure/Success Cases [ ]pdfmark= /F (resultGirl.avi) /Poster true ¿¿,Annotations=¡¡ Experimental
Results 35/41

Failure/Success Cases [ ]pdfmark= /F (resultMonkey.avi) /Poster true ¿¿,Annotations=¡¡ Experimental
Results 35/41

Computation Time vs. Performance Trade-oﬀ 0 0.2 0.4 0.6 0.8
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Precision Computation Time per Frame(sec) Geodesic [21] Roto Brush [17] Propsed Method 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 Recall Computation Time per Frame(sec) Geodesic [21] Roto Brush [17] Propsed Method All values are average over all videos in SegTrack[Tsai et al., 2010] Dataset. Experimental Results 36/41

Automatic Video Segmentation Extension There are many successful automatic video
object segmentation tools using computational costly features like saliency, optical flow and shape. Proposed interactive video segmentation tool is efficient; however, requires an interaction in first frame. Any MRF Energy based automatic video segmentation tool can be used to initialize the proposed method. Proposed MRF Energy estimation method is experimented as a speed-up tool for Keysegments [Lee et al., 2011] algorithm. Automatic Video Segmentation 37/41

Precision-recall curves for Automatic Video Segmentation 0 0.2 0.4 0.6
0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Keysegments [14] Proposed Method Birdfall 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Precision Recall Keysegments [14] Proposed Method Cheetah 0 0.2 0.4 0.6 0.8 1 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 Precision Recall Keysegments [14] Proposed Method Girl 0 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Recall Keysegments [14] Proposed Method Monkey 0 0.2 0.4 0.6 0.8 1 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Precision Recall Keysegments [14] Proposed Method Parachute Computation Time (on Matlab): Key-Segments [Lee, 2011]: 260.6 sec per frame Proposed Speed-Up: 4.0 sec per frame Automatic Video Segmentation 38/41

Conclusions It is possible to find a sub-graph giving (approximately)
same results with global solution. Spatial information and user interaction is too valuable to discard even in erroneous case. Dynamic formulation of user interaction increase user satisfaction and makes efficient graph optimization possible. Interactive video segmentation problem is actually an estimation problem. Given a reliable spatio-temporal distance, it is possible to compensate lack of motion information. Solution to min-cut/max-flow problem is linear and can easily be combined by other linear formulations. Conclusion & Future Work 39/41

Future Work Graph theoretical analysis of superpixel graph. Parallel implementation
is possible via dual deﬁnition of the problem Spatio-temporal formulation of video segmentation problem is possible. Conclusion & Future Work 40/41

Thank you for your attention. DEMO 41/41

Master Thesis Presentation at METU

Master Thesis Presentation at METU

More Decks by Ozan Sener

Other Decks in Research

Featured

Transcript