Computational Intelligence: images <-> sentences

2f4faa539dc6a0ae688e58d6a329fce9?s=47 gregab
November 05, 2014

Computational Intelligence: images <-> sentences

Looking at two approaches to using words to search for images and video and producing textual descriptions of images and video.

The papers discussed are: http://0xab.com/papers/cvpr2014 and https://www.cs.cmu.edu/~afarhadi/papers/sentence.pdf

The video demo of the second paper (slide 18) is available here: http://0xab.com/research/video-events.html

Presented in MIT 9.S915: Aspects of a Computational Theory of Intelligence. http://cs.wellesley.edu/~vision/

2f4faa539dc6a0ae688e58d6a329fce9?s=128

gregab

November 05, 2014
Tweet

Transcript

  1. images sentences

  2. Questions & Observations ! • Hierarchical ensemble methods • Shared

    intermediate representation • Real-time performance? • Neurological realism?
  3. Every Picture Tells a Story: Generating Sentences from Images Ali

    Farhadi, Mohsen Hejrati , Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, David Forsyth
  4. None
  5. images sentences

  6. Felzenszwalb Detector A Discriminatively Trained, Multiscale, Deformable Part Model Pedro

    F. Felzenszwalb, David McAllester and Deva Ramanan
  7. Linear SVM ! ! Felzenszwalb detector Hoiem 3D scene model

    GIST scene features (+Adaboost) Node features + scores
  8. None
  9. Edge Potentials • given a test image • k-nn training

    examples, average node features • from the image side: node features for similar images • from the sentence side: sentence representation for similar images • Multi-label Markov Random Field
  10. images sentences

  11. Curran & Clark Tools • Maximum Entropy Tagger • POS

    Tagger • Combinatory Categorial Grammar (CCG) • Chunker • Named Entity Recognizer
  12. C&C Parser Dependency Parse Subject/Direct Object Head nouns from prepositional

    phrases (“X in the background”) Scene information
  13. Node Potentials: Lin Similarity • Wordnet! • Hypernyms (is-a) •

    Hyponyms (instance-of) • Compare synsets
  14. None
  15. Edge Potentials • given a test image • k-nn training

    examples, average node features • from the image side: node features for similar images • from the sentence side: sentence representation for similar images • Multi-label Markov Random Field
  16. Structure Learning Finding weights on linear combinations on nodes and

    edges so that the ground truth triplet scores highest
  17. N. Siddharth, Andrei Barbu, Jeffrey Mark Siskind ! Seeing What

    You’re Told: Sentence-Guided Activity Recognition In Video
  18. None
  19. None
  20. Object Detection Track Event Recognizer Sentences

  21. Object Detection Track Event Recognizer Sentences Felzenszwalb + false positives

  22. Object Detection Track Event Recognizer Sentences Felzenszwalb + false positives

    Felz. confidence + Optical Flow
  23. Object Detection Track Event Recognizer Sentences Dynamic Programming Maximize detection

    confidence and optical flow continuity
  24. Object Detection Track Event Recognizer Sentences Per-Object/Per-frame • Position •

    Velocity • Acceleration • Aspect Ratio Agent+Instrument • Distance • Orientation A time series of feature vectors Train with Hidden Markov-Model (per-word in lexicon)
  25. Object Detection Track Event Recognizer Sentences Recognize with HMM Maximize

    linear combination of observations and state transitions
  26. Object Detection Track Event Recognizer Sentences Sentence Tracker Determine whether

    a set of tracks matches a sentence by maximizing the probability of the cross-product lattice
  27. Natural Language Semantics

  28. Natural Language Semantics

  29. None
  30. None
  31. Questions & Observations ! • Hierarchical ensemble methods • Shared

    intermediate representation • Real-time performance? • Neurological realism?