Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Computational Intelligence: images <-> sentences

gregab
November 05, 2014

Computational Intelligence: images <-> sentences

Looking at two approaches to using words to search for images and video and producing textual descriptions of images and video.

The papers discussed are: http://0xab.com/papers/cvpr2014 and https://www.cs.cmu.edu/~afarhadi/papers/sentence.pdf

The video demo of the second paper (slide 18) is available here: http://0xab.com/research/video-events.html

Presented in MIT 9.S915: Aspects of a Computational Theory of Intelligence. http://cs.wellesley.edu/~vision/

gregab

November 05, 2014
Tweet

More Decks by gregab

Other Decks in Technology

Transcript

  1. Questions & Observations ! • Hierarchical ensemble methods • Shared

    intermediate representation • Real-time performance? • Neurological realism?
  2. Every Picture Tells a Story: Generating Sentences from Images Ali

    Farhadi, Mohsen Hejrati , Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth
  3. Linear SVM ! ! Felzenszwalb detector Hoiem 3D scene model

    GIST scene features (+Adaboost) Node features + scores
  4. Edge Potentials • given a test image • k-nn training

    examples, average node features • from the image side: node features for similar images • from the sentence side: sentence representation for similar images • Multi-label Markov Random Field
  5. Curran & Clark Tools • Maximum Entropy Tagger • POS

    Tagger • Combinatory Categorial Grammar (CCG) • Chunker • Named Entity Recognizer
  6. C&C Parser Dependency Parse Subject/Direct Object Head nouns from prepositional

    phrases (“X in the background”) Scene information
  7. Edge Potentials • given a test image • k-nn training

    examples, average node features • from the image side: node features for similar images • from the sentence side: sentence representation for similar images • Multi-label Markov Random Field
  8. Structure Learning Finding weights on linear combinations on nodes and

    edges so that the ground truth triplet scores highest
  9. N. Siddharth, Andrei Barbu, Jeffrey Mark Siskind ! Seeing What

    You’re Told: Sentence-Guided Activity Recognition In Video
  10. Object Detection Track Event Recognizer Sentences Per-Object/Per-frame • Position •

    Velocity • Acceleration • Aspect Ratio Agent+Instrument • Distance • Orientation A time series of feature vectors Train with Hidden Markov-Model (per-word in lexicon)
  11. Object Detection Track Event Recognizer Sentences Recognize with HMM Maximize

    linear combination of observations and state transitions
  12. Object Detection Track Event Recognizer Sentences Sentence Tracker Determine whether

    a set of tracks matches a sentence by maximizing the probability of the cross-product lattice
  13. Questions & Observations ! • Hierarchical ensemble methods • Shared

    intermediate representation • Real-time performance? • Neurological realism?