Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HADES: Hierarchical Approximate Decoding for Structured Prediction

HADES: Hierarchical Approximate Decoding for Structured Prediction

A 30-min talk covering my MSc. thesis defense at the Data Analytics Lab at ETH Zürich. For information, visit here: http://e-collection.library.ethz.ch/view/eth:48405

Tribhuvanesh Orekondy

October 22, 2015
Tweet

More Decks by Tribhuvanesh Orekondy

Other Decks in Research

Transcript

  1. HADES : Hierarchical Approximate Decoding for Structured Prediction Tribhuvanesh Orekondy

    ETH Zürich Advisors Martin Jaggi Aurelien Lucchi Thomas Hofmann
  2. 3 Building Car Road Computer Vision Setting: Structured Output Learning

    This is a tagged sentence DT VBZ DT JJ NN Natural Language Processing
  3. Setting: Structured Output Learning 4 DT VBZ DT JJ NN

    Predict discrete output variable This is a tagged sentence Observed input variable Learn good predictor (parameterized by )
  4. Problem Maximization Oracle Y1 Y4 Y2 Y5 Y3 Y6 9

    Image Segmentation bike person
  5. Approximate solutions 10 Maximization Oracle “.. learning can fail even

    with an approximate inference method with rigorous approximation guarantees ..” This thesis How can we learn using approximate solutions for Image Segmentation?
  6. 17

  7. • Data • MSRC-21
 • dissolvestruct
 • Features • Unary:

    CNN • Pairwise: Orientation, Edge Intensity 31 Results - Setup
  8. • Data • MSRC-21
 • dissolvestruct
 • Features • Unary:

    CNN • Pairwise: Orientation, Edge Intensity
 • Hierarchy • HSLIC: 6 Levels
 32 Results - Setup
  9. • Data • MSRC-21
 • dissolvestruct
 • Features • Unary:

    CNN • Pairwise: Orientation, Edge Intensity
 • Hierarchy • HSLIC: 6 Levels
 • Evaluation Metric 33 Results - Setup
  10. • RPL
 Resume Previous Level 37 Results - RPL +

    STUBR • STUBR
 Stub Repetitions
  11. • Motivation
 Max-oracles are expensive in Computer Vision.
 Learning using

    approximate oracles are not well understood. • Approach
 Coarse-to-fine approximation-based BCFW-variant.
 Hierarchical Surrogate CRF model for Image Segmentation. • Results
 Approximate decoding is 50-60x faster.
 75% mark obtained 1.5x-4x faster. 40 Conclusion
  12. 43

  13. • Surrogate Energy Function
 • Unary Factor El(Y = y

    |X = x ; w ) = X u2 ˜ Vl h w D yu , xu i + X (u,v)2 ˜ El w P yuyv xu xi xj xk xu = X i2atm(u) xi SURROGATE CRF - DEFINITION ˜ Gl = ( ˜ Vl, ˜ El) “atom” “supernode”
  14. • Surrogate Energy Function
 • Pairwise Factor X ( u,v

    ) 2 ˜ El P(yu, yv; wP ) = X ( u,v ) 2 ˜ El X ( i,j ) 2 atmE ( u,v ) P(Yi = yu, Yj = yv; wP ) | {z } Supernode transition + X u2 ˜ Vl X ( i,j ) 2 atmE ( u ) P(Yi = Yj = yu; wP ) | {z } Static transition El(Y = y |X = x ; w ) = X u2 ˜ Vl h w D yu , xu i + X (u,v)2 ˜ El w P yuyv SURROGATE CRF - DEFINITION ˜ Gl = ( ˜ Vl, ˜ El) “atom” “supernode”
  15. PROPOSITIONS† • Equivalence of CRFs
 
 • Equivalent Loss-Augmented Decoding


    
 • Hierarchical Decoding argmin y2Ym l E( y ; x m, w ) ⌘ argmin y2 ˜ Ym l El( y ; x m, w ) argmin y2Yl E( y ; x m, w ) ( y , y m) ⌘ argmin y2 ˜ Yl El( y ; x m, w ) ( y , y m) min y2Yl+1 E( y ; x , w )  min y2Yl E( y ; x , w ) † Proofs excluded
  16. APPROXIMATION QUALITY • Gauge
 • Additive Error E( y ⇤;

    x , w )  El( y ⇤; x , w )  E( y ⇤; x , w ) + ⇢(l) E⇤ P  E⇤ l  E⇤ P + ⇢(l) E⇤ l E⇤ P  ( P Nl) · 2 BRU + ( Z Tl) · 2 BRP P # Nodes – level P Nl # Nodes – level l Z # Edges – level P Tl # Super-node transitions – level l kwi k  B k xi k  RU k P ( yi, yj) k  RP
  17. 49