Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Master Thesis Presentation at METU

Master Thesis Presentation at METU

Ozan Sener

May 01, 2013
Tweet

More Decks by Ozan Sener

Other Decks in Research

Transcript

  1. An Efficient Graph-Theoretical Approach for Interactive Mobile
    Image and Video Segmentation
    Ozan S
    ¸ener
    Electrical and Electronics Engineering Department
    Middle East Technical University
    May 14, 2015

    View Slide

  2. Interactive Segmentation Problem
    User Interaction Segmentation Mask Application
    Segmentation Mask
    (Rest of the Video)
    Application
    Interactive Mobile Segmentation 1/41

    View Slide

  3. Issues Related to Mobile Touch-Screen Devices
    Photo courtesy of Adobe Systems Inc.
    Rich Interaction Possibilities
    More Interaction Errors
    Low Computational Power
    Interactive Mobile Segmentation 2/41

    View Slide

  4. Outline
    Interactive Image Segmentation
    Review of the Literature
    Proposed Interaction Methodology
    Proposed Spatially & Temporally Dynamic Graph Cut
    Proposed Error Correction
    Experiments on Interactive Image Segmentation
    Interactive Video Segmentation
    Review of the Literature
    Proposed Filtering Based Formulation
    Proposed Linear Dynamic Graph-Cut
    Proposed Automatic Video Object Segmentation Extension
    Experiments on Interactive Image Segmentation
    Experiments on Automatic Video Segmentation
    Interactive Mobile Segmentation 3/41

    View Slide

  5. Building Blocks of Interactive Image Segmentation
    User Interaction Model Formulation Optimization
    Scribbles
    Approximate Boundary
    Bounding Box
    t
    s
    i j
    w
    is
    w
    it
    w
    js
    w
    jt
    w
    ij
    w
    ji
    Min Cut / Max Flow
    Dynamic Programming
    Boundary Path Cost
    Gaussian Mixture Model
    Kernel Density Estimation
    Interactive Mobile Segmentation 4/41

    View Slide

  6. Related Work for Interactive Image Segmentation
    N-D Image Segmentation
    [Boykov, Jolly 2001]
    Grabcut
    [Rother et al. 2004]
    Geodesic Image Matting
    [Bai, Sapiro 2008]
    Lazy Snapping
    [Lin et al. 2004]
    Intelligent Scissors
    [Mornsten, Barett 95]
    Model Formulation Optimization
    Approximate Boundary
    User Interaction
    Scribbles
    Bounding Box
    Dynamic Programming
    Boundary Path Cost
    Gaussian Mixture Model
    Kernel Density Estimation
    t
    s
    i j
    w
    is
    w
    it
    w
    js
    w
    jt
    w
    ij
    w
    ji
    Min Cut / Max Flow
    Interactive Mobile Segmentation 5/41

    View Slide

  7. Proposed Interaction Method - Coloring
    UserInteraction
    Scribbles
    ApproximateBoundary
    BoundingBox
    Model Formulation and Optimization
    depends on user interaction
    Proposed Method 6/41

    View Slide

  8. Proposed Interaction Method - Coloring
    Gesture of Coloring a Color Book
    Proposed Method 7/41

    View Slide

  9. Pixel Grid to Over-segment Graph
    Complexity of the most of the graph
    algorithms depends on number of nodes and
    edges.
    Most intuitive approach to increase
    computational efficiency is using
    over-segmentation
    All algorithms are developed on generic graphs
    and all experiments are conducted on
    over-segment graph obtained by SLIC
    algorithm [Achanta et al., 2010]
    Proposed Method 8/41

    View Slide

  10. Graphical Model for the Segmentation
    x
    2
    x
    4
    x
    5
    x
    6
    x
    7
    x
    8
    . . .
    . . .
    . . .
    . . .
    . . .
    z
    1
    . . .
    . . .
    . . .
    . . .
    . . .
    . . .
    . . . . . .
    . . .
    . . .
    . . .
    x
    1
    z
    1
    . . .
    θ
    z
    0
    x
    0
    x
    3
    z
    3
    Image/Video is represented as a
    graph of pixels/regions.
    User interaction is formulated as
    a parametric model.
    Resultant dependency network is
    Markov Random Field (MRF).
    xi: Labels of each pixel/region
    zi: Color of each pixel/region
    θ: Parametric model of FG/BG
    (GMM learned by interaction)
    Graph Theoretical Segmentation 9/41

    View Slide

  11. Transformation to Energy-Minimization
    Factorization is not possible.
    Hammersley - Clifford theorem [Clifford, 1990]:
    p(x|z, θ) = 1
    Z
    exp(−E(x, z, θ))
    MAP (Maximum a Posterior) solution is:
    arg min
    x
    E(x, z, θ)
    arg min
    x
    .
    .

    vi∈V
    Ei(xi, zi) + .
    .

    eij∈E
    Eij(xi, xj)
    .
    .
    GMM Likelihood .
    .
    Smoothness Penalty
    Equivalent to a Min-Cut on 2-terminal graph
    s
    x
    2
    x
    6
    x
    7
    x
    8
    x
    3
    t
    x
    4
    x
    5
    x
    0
    x
    1
    s-t cut
    E1(x1)+E2(x2)+E12(x1, x2)
    E1(1)
    E1(0)
    E2(1)
    E2(0)
    E12(1,0)=E12(0,1)
    v
    2
    v
    1
    s
    t
    .
    Graph Theoretical Segmentation 10/41

    View Slide

  12. Finding Minimum Cut on s-t Graph
    Dual problem is finding maximum flow
    from s to t [Ford, Fulkerson 1962].
    Pushing any flow from s to t does not change the
    solution.
    Maximum flow is found via augmenting paths
    [Ford, Fulkerson 1962].
    Find and push a valid flow from s to t
    Update the graph:
    re = we − f(e) ∀e ∈ E
    Until there exist no flow
    re
    : Residual weight of the edge e
    we
    : Weight of the edge e
    f(e) : Flow push through the edge e
    E1(1)
    E1(0)
    E2(1)
    E2(0)
    E12(0,1)
    v
    2
    v
    1
    s
    t
    E12(1,0)
    αfow
    βfow
    Graph Theoretical Segmentation 11/41

    View Slide

  13. Temporally Dynamic Graph-Cut [Kohli et al. 2005]
    Consider every iteration of the algorithm;
    Graph structure and binary edge weights are not changing,
    Unary edge weights changing slightly.
    Previous flows can be re-used with an update:
    rt
    ei
    = rt−1
    ei
    + wt
    ei
    − wt−1
    ei
    Resultant residual graph will be sparse:
    Augmenting path algorithm will converge in less iteration
    Proposed Dynamic Graph-Cut 12/41

    View Slide

  14. Proposed Spatially & Temporally Dynamic Graph-Cut
    Proposed interaction has the property of locality
    Can we extend the dynamic graph-cut idea to spatial dimensions ?
    Is it possible to find a sub-graph around the interaction which
    gives approximately same result with global solution ?
    Proposed Dynamic Graph-Cut 13/41

    View Slide

  15. Local Robustness Rule
    Consider the max-flow computed for a region R,
    This solution can be extended to a global one;
    Label of the nodes in R can only be flipped via flows coming from
    outside of R.
    Following condition is sufficient for robustness (proof is omitted)
    If R is foreground (connected to source)

    i∈R
    wiS − wiT >

    i∈R,j∈N
    ∃P ath(i,j),e∈E∩P ath(i,j)
    min(we)
    If R is background (connected to sink)

    i∈R
    wiT − wiS >

    i∈R,j∈N
    ∃P ath(j,i),e∈E∩P ath(j,i)
    min(we)
    Proposed Dynamic Graph-Cut 14/41

    View Slide

  16. Local Robustness Rule - Weaker Rule
    Instead of nodes, consider the robustness of the clusters obtained
    via GMM
    Instead of cluster boundaries, use boundary of the rectangle R
    Weaker condition is:
    If R is foreground (connected to source)

    i∈R
    wiS − wiT >

    iR,j /
    ∈R
    wij
    If R is background (connected to sink)

    i∈R
    wiT − wiS >

    j /
    ∈R,i∈R
    wji
    Proposed algorithm starts with the bounding box of the user
    interaction and enlarges the solution until proposed condition is
    satisfied.
    Proposed Dynamic Graph-Cut 15/41

    View Slide

  17. Spatially & Temporally Dynamic Graph-Cut in Action
    a: Blue rectangle is bounding box of the current interaction, Red
    rectangle is the computed bounding box. b: Result of graph-cut
    for blue rectangle c: Result of graph-cut for red rectangle.
    Proposed Dynamic Graph-Cut 16/41

    View Slide

  18. Error Tolerance Options
    Solve interaction errors
    before optimization
    vs
    within optimization
    Proposed Dynamic Graph-Cut 17/41

    View Slide

  19. Interaction Error Correction Algorithm
    Keep a single RGB Gaussian model for the
    color profile of the interaction
    Proposed Dynamic Graph-Cut 18/41

    View Slide

  20. Interaction Error Correction Algorithm
    Keep a single RGB Gaussian model for the
    color profile of the interaction
    Start to discard interactions which is not
    consistent with color model
    until user comes back to the initial region
    Proposed Dynamic Graph-Cut 18/41

    View Slide

  21. Interaction Error Correction Algorithm
    Keep a single RGB Gaussian model for the
    color profile of the interaction
    Start to discard interactions which is not
    consistent with color model
    until user comes back to the initial region
    or move to the another color profile.
    Proposed Dynamic Graph-Cut 18/41

    View Slide

  22. Interaction Error Correction Algorithm
    Keep a single RGB Gaussian model for the
    color profile of the interaction
    Start to discard interactions which is not
    consistent with color model
    until user comes back to the initial region
    or move to the another color profile.
    Replace the discarded interaction with the
    path minimizing
    Cost(path) =

    u,v∈path
    |xu − xv| + λ|Iu − Iv|
    Proposed Dynamic Graph-Cut 18/41

    View Slide

  23. Error Correction in Action
    Single Color
    True Positive
    Multi Color
    True Positive
    Multi Color
    False Positive
    Notes:
    False Positives are handled via path finding.
    False Negatives requires a restart.
    Proposed Dynamic Graph-Cut 19/41

    View Slide

  24. Subjective Evaluation of Interaction Quality
    15 Subjects (Undergraduate Level Engineering Students)
    4 Random images out of 10 images
    Grading in the level of 1-5 for 4 different metrics
    Results in the format of Median(STD)
    P-Values (via dependent ANOVA test): 0.0005
    Perf. Easiness Entertain. Overall
    Proposed Method 5 (0.45) 4 (0.86) 5 (0.74) 4 (0.45)
    GrabCut 3 (0.92) 4 (0.75) 2 (0.61) 3 (0.77)
    t[Rotheretal., 2004]
    Intelligent Scissor. 3 (0.51) 2 (0.74) 3 (0.89) 2 (0.76)
    [Mortensen, 1995]
    Experimental Results 20/41

    View Slide

  25. Experiments on Error Correction.
    Interaction No Error Correction Soft Label Graph-Cut Proposed Method
    Experimental Results 21/41

    View Slide

  26. Computation Time Improvement via Spatially &
    Temporally Dynamic Graph-Cut
    0
    200
    400
    600
    800
    1000
    10 20 30 40 50 60
    Execution Time (msec)
    Iteration (User Interaction)
    Boykov&Kolmogrov [4]
    Kohli&Torr [10]
    Proposed Method
    Interaction throughout the entire process is divided into set of
    interactions on 3 superpixels and fed to all algorithms.
    Experimental Results 22/41

    View Slide

  27. Outline
    Interactive Image Segmentation
    Review of the Literature
    Proposed Interaction Methodology
    Proposed Spatially & Temporally Dynamic Graph Cut
    Proposed Error Correction
    Experiments on Interactive Image Segmentation
    Interactive Video Segmentation
    Review of the Literature
    Proposed Filtering Based Formulation
    Proposed Linear Dynamic Graph-Cut
    Proposed Automatic Video Object Segmentation Extension
    Experiments on Interactive Image Segmentation
    Experiments on Automatic Video Segmentation
    Experimental Results 23/41

    View Slide

  28. Review of the Interactive Video Segmentation Literature
    Propagate
    Local Classi ers
    Color and Shape Models
    via
    Motion Information
    Feature Matching
    Interaction
    Solve with
    Graph Clustering
    Linear Matting
    Local Search
    Interaction
    t
    s
    i j
    w
    is
    w
    it
    w
    js
    w
    jt
    w
    ij
    w
    ji
    Min Cut / Max Flow
    Min-Cut/Max-Flow
    Rotobrush
    [Bai et al., 2009]
    [Zhang et al., 2008]
    [Grundman et al., 2010]
    Geodesic Video
    [Bai et al., 2007]
    Interactive Video Segmentation 24/41

    View Slide

  29. Re-definition of the Video Segmentation Problem
    MRF Energy of the initial frame is obtained via interaction;
    E(α) =

    vi∈V
    U(αi
    , zi
    ) +

    vi∈V

    vj ∈N(vi)
    V (zi
    , zj
    )ϕ[αi
    ̸= αj
    ]
    Markovian property implies that we can estimate MRF energy of
    the current frame via MRF energy of the previous frame.
    Given a spatio-temporal distance function, linear estimation is
    possible via;
    Ut(αt
    i
    , zt
    i
    ) = 1
    γt
    i

    vt−1
    j
    ∈Vt−1
    Ut−1(αt−1
    j
    , zt−1
    j
    )e−dis(zt
    i
    ,zt−1
    j
    )
    V t(zt
    i
    , zt
    j
    ) = 1
    γt
    ij

    vk∈Vt−1

    vl∈N
    e−dis(zt
    i
    ,zt−1
    k
    )e−dis(zt
    j
    ,zt−1
    l
    )V t−1(zt−1
    k
    , zt−1
    l
    )
    Interactive Video Segmentation 25/41

    View Slide

  30. Selection of Spatio - Temporal Distance
    Ideally, spatio-temporal geodesic is the best choice.
    Computational complexity of geodesic distance filter -O(n3)- is not
    affordable in mobile scenarios.
    Framet-1
    Framet
    Temporal
    Horizontal
    Vertical
    Information Permeability/Bi-exponential (IP/BE) [Cigla, Alatan,
    2010]/[Thvenaz et al., 2012] Filter is an approximate yet efficient
    -O(n)- alternative to geodesic distance filter.
    Interactive Video Segmentation 26/41

    View Slide

  31. Information Permeability/Bi-exponential (IP/BE) Filter
    Distance computation and filtering can be obtained simultaneously
    in linear time via independent 1-tap recursive filters in all
    dimensions (x,y and t).
    ˆ
    x1
    [k] = x1
    [k] + ˆ
    x1
    [k − 1]r(x[k], x[k − 1])
    and
    ˆ
    x2
    [k] = x2
    [k] + ˆ
    x2
    [k + 1]r(x[k], x[k + 1])
    with normalization
    y[k] =
    ˆ
    x1
    [k] + ˆ
    x2
    [k]
    ˆ
    11
    [k] + ˆ
    12
    [k]
    Interactive Video Segmentation 27/41

    View Slide

  32. Sample MRF Energy Propagation
    100 200 300 400 500 600
    50
    100
    150
    200
    250
    300
    350
    400
    450
    −4
    −2
    0
    2
    4
    6
    8
    U1(α1
    i
    , z1
    i
    ) V 1(z1
    i
    , z1
    j
    )
    100 200 300 400 500 600
    50
    100
    150
    200
    250
    300
    350
    400
    450 −4
    −3
    −2
    −1
    0
    1
    2
    3
    4
    5
    6
    ˆ
    U5(α5
    i
    , z5
    i
    ) ˆ
    V 5(z5
    i
    , z5
    j
    )
    Interactive Video Segmentation 28/41

    View Slide

  33. Dynamic Graph-Cut
    MRF Energy of the every frame is solved independently.
    There is a significant redundancy; however, graph structure is
    changing due to the over-segmentation.
    Either solves a computationally expensive graph matching (best
    known algorithm is O(n2logn)) or exploit linearity.
    .
    Proposition
    .
    .
    .
    Binary labels obtained by minimizing the MRF energy, resulted
    after applying bilateral filter on the energy function which is
    defined via residual graph, is equivalent to minimizing the MRF
    energy obtained via applying bilateral filter on the original energy
    function.
    Interactive Video Segmentation 29/41

    View Slide

  34. Dynamic Graph-Cut for Linear Filtering
    t
    s
    i j
    w
    is
    w
    it
    w
    js
    w
    jt
    w
    ij
    w
    ji
    Graph t
    t
    s
    i j
    r
    is
    r
    it
    r
    js
    r
    jt
    r
    ij
    r
    ji
    s
    t
    Linear Transformation
    (Bilateral Filter)
    a
    b
    c
    w
    as
    w
    cs
    w
    at
    w
    bt w
    ct
    wab
    w
    ca
    s
    t
    a
    b
    c
    r
    as
    r
    cs
    r
    at
    r
    bt r
    ct
    rab
    r
    ca
    Min-Cut
    Max-Flow
    s
    t
    a
    b
    c
    Min-Cut
    Max-Flow
    Min-Cut
    Max-Flow
    s
    t
    a
    b
    c
    =
    t
    ia
    t
    ja
    t
    ib
    t
    jb
    t
    ic
    t
    jc
    w
    as
    w
    bs
    w
    cs
    w
    is
    w
    js
    Graph t+1 Solution t+1
    Residual Graph t Residual Graph t+1 Residual Solution t+1
    =
    t
    ia
    t
    ja
    t
    ib
    t
    jb
    t
    ic
    t
    jc
    r
    as
    r
    bs
    r
    cs
    r
    is
    r
    js
    Linear Transformation
    (Bilateral Filter)
    Interactive Video Segmentation 30/41

    View Slide

  35. Sample Segmentation Result
    [ ]pdfmark=
    /F (res2.avi) /Poster true ¿¿,Annotations=¡¡
    Experimental Results 31/41

    View Slide

  36. Comparison of Segmentation Quality
    [ ]pdfmark=
    /F (resultIce.avi) /Poster true ¿¿,Annotations=¡¡
    Experimental Results 32/41

    View Slide

  37. Computation Time Improvement via Dynamic Graph-Cut
    0
    10
    20
    30
    40
    50
    60
    70
    0 10 20 30 40 50 60 70 80 90 100
    Computation Time (msec)
    Frame Number
    Min-Cut/Max-Flow [8]
    Proposed Method
    Experimental Results 33/41

    View Slide

  38. Precision-recall curves for SegTrack[Tsai et al., 2010]
    Dataset.
    0
    0.2
    0.4
    0.6
    0.8
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    Precision
    Recall
    Geodesic [21]
    Roto Brush [17]
    Proposed Method
    Birdfall
    0
    0.2
    0.4
    0.6
    0.8
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
    Precision
    Recall
    Geodesic [21]
    Roto Brush [17]
    Proposed Method
    Cheetah
    0
    0.2
    0.4
    0.6
    0.8
    1
    0.4 0.5 0.6 0.7 0.8 0.9 1
    Precision
    Recall
    Geodesic [21]
    Roto Brush [17]
    Proposed Method
    Girl
    0
    0.2
    0.4
    0.6
    0.8
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    Precision
    Recall
    Geodesic [21]
    Roto Brush [17]
    Proposed Method
    Monkey
    0
    0.2
    0.4
    0.6
    0.8
    1
    0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
    Precision
    Recall
    Geodesic [21]
    Roto Brush [17]
    Proposed Method
    Penguin
    0
    0.2
    0.4
    0.6
    0.8
    1
    0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    Precision
    Recall
    Geodesic [21]
    Roto Brush [17]
    Proposed Method
    Parachute
    Experimental Results 34/41

    View Slide

  39. Failure/Success Cases
    [ ]pdfmark=
    /F (resultGirl.avi) /Poster true ¿¿,Annotations=¡¡
    Experimental Results 35/41

    View Slide

  40. Failure/Success Cases
    [ ]pdfmark=
    /F (resultMonkey.avi) /Poster true ¿¿,Annotations=¡¡
    Experimental Results 35/41

    View Slide

  41. Computation Time vs. Performance Trade-off
    0
    0.2
    0.4
    0.6
    0.8
    1
    1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
    Precision
    Computation Time per Frame(sec)
    Geodesic [21]
    Roto Brush [17]
    Propsed Method
    0
    0.2
    0.4
    0.6
    0.8
    1
    1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
    Recall
    Computation Time per Frame(sec)
    Geodesic [21]
    Roto Brush [17]
    Propsed Method
    All values are average over all videos in SegTrack[Tsai et al., 2010]
    Dataset.
    Experimental Results 36/41

    View Slide

  42. Automatic Video Segmentation Extension
    There are many successful automatic video object segmentation
    tools using computational costly features like saliency, optical flow
    and shape.
    Proposed interactive video segmentation tool is efficient; however,
    requires an interaction in first frame.
    Any MRF Energy based automatic video segmentation tool can be
    used to initialize the proposed method.
    Proposed MRF Energy estimation method is experimented as a
    speed-up tool for Keysegments [Lee et al., 2011] algorithm.
    Automatic Video Segmentation 37/41

    View Slide

  43. Precision-recall curves for Automatic Video Segmentation
    0
    0.2
    0.4
    0.6
    0.8
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    Precision
    Recall
    Keysegments [14]
    Proposed Method
    Birdfall
    0
    0.2
    0.4
    0.6
    0.8
    1
    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
    Precision
    Recall
    Keysegments [14]
    Proposed Method
    Cheetah
    0
    0.2
    0.4
    0.6
    0.8
    1
    0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
    Precision
    Recall
    Keysegments [14]
    Proposed Method
    Girl
    0
    0.2
    0.4
    0.6
    0.8
    1
    0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
    Precision
    Recall
    Keysegments [14]
    Proposed Method
    Monkey
    0
    0.2
    0.4
    0.6
    0.8
    1
    0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
    Precision
    Recall
    Keysegments [14]
    Proposed Method
    Parachute
    Computation Time (on Matlab):
    Key-Segments [Lee, 2011]: 260.6 sec per frame
    Proposed Speed-Up: 4.0 sec per frame
    Automatic Video Segmentation 38/41

    View Slide

  44. Conclusions
    It is possible to find a sub-graph giving (approximately) same
    results with global solution.
    Spatial information and user interaction is too valuable to discard
    even in erroneous case.
    Dynamic formulation of user interaction increase user satisfaction
    and makes efficient graph optimization possible.
    Interactive video segmentation problem is actually an estimation
    problem.
    Given a reliable spatio-temporal distance, it is possible to
    compensate lack of motion information.
    Solution to min-cut/max-flow problem is linear and can easily be
    combined by other linear formulations.
    Conclusion & Future Work 39/41

    View Slide

  45. Future Work
    Graph theoretical analysis of superpixel graph.
    Parallel implementation is possible via dual definition of the
    problem
    Spatio-temporal formulation of video segmentation problem is
    possible.
    Conclusion & Future Work 40/41

    View Slide

  46. Thank you for your attention.
    DEMO 41/41

    View Slide