Slide 1

Slide 1 text

images sentences

Slide 2

Slide 2 text

Questions & Observations ! • Hierarchical ensemble methods • Shared intermediate representation • Real-time performance? • Neurological realism?

Slide 3

Slide 3 text

Every Picture Tells a Story: Generating Sentences from Images Ali Farhadi, Mohsen Hejrati , Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

images sentences

Slide 6

Slide 6 text

Felzenszwalb Detector A Discriminatively Trained, Multiscale, Deformable Part Model Pedro F. Felzenszwalb, David McAllester and Deva Ramanan

Slide 7

Slide 7 text

Linear SVM ! ! Felzenszwalb detector Hoiem 3D scene model GIST scene features (+Adaboost) Node features + scores

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Edge Potentials • given a test image • k-nn training examples, average node features • from the image side: node features for similar images • from the sentence side: sentence representation for similar images • Multi-label Markov Random Field

Slide 10

Slide 10 text

images sentences

Slide 11

Slide 11 text

Curran & Clark Tools • Maximum Entropy Tagger • POS Tagger • Combinatory Categorial Grammar (CCG) • Chunker • Named Entity Recognizer

Slide 12

Slide 12 text

C&C Parser Dependency Parse Subject/Direct Object Head nouns from prepositional phrases (“X in the background”) Scene information

Slide 13

Slide 13 text

Node Potentials: Lin Similarity • Wordnet! • Hypernyms (is-a) • Hyponyms (instance-of) • Compare synsets

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Edge Potentials • given a test image • k-nn training examples, average node features • from the image side: node features for similar images • from the sentence side: sentence representation for similar images • Multi-label Markov Random Field

Slide 16

Slide 16 text

Structure Learning Finding weights on linear combinations on nodes and edges so that the ground truth triplet scores highest

Slide 17

Slide 17 text

N. Siddharth, Andrei Barbu, Jeffrey Mark Siskind ! Seeing What You’re Told: Sentence-Guided Activity Recognition In Video

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Object Detection Track Event Recognizer Sentences

Slide 21

Slide 21 text

Object Detection Track Event Recognizer Sentences Felzenszwalb + false positives

Slide 22

Slide 22 text

Object Detection Track Event Recognizer Sentences Felzenszwalb + false positives Felz. confidence + Optical Flow

Slide 23

Slide 23 text

Object Detection Track Event Recognizer Sentences Dynamic Programming Maximize detection confidence and optical flow continuity

Slide 24

Slide 24 text

Object Detection Track Event Recognizer Sentences Per-Object/Per-frame • Position • Velocity • Acceleration • Aspect Ratio Agent+Instrument • Distance • Orientation A time series of feature vectors Train with Hidden Markov-Model (per-word in lexicon)

Slide 25

Slide 25 text

Object Detection Track Event Recognizer Sentences Recognize with HMM Maximize linear combination of observations and state transitions

Slide 26

Slide 26 text

Object Detection Track Event Recognizer Sentences Sentence Tracker Determine whether a set of tracks matches a sentence by maximizing the probability of the cross-product lattice

Slide 27

Slide 27 text

Natural Language Semantics

Slide 28

Slide 28 text

Natural Language Semantics

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Questions & Observations ! • Hierarchical ensemble methods • Shared intermediate representation • Real-time performance? • Neurological realism?