Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leveraging AI for Scene Detection in Broadcast Video - Rik Heijdens

Demuxed
October 05, 2017

Leveraging AI for Scene Detection in Broadcast Video - Rik Heijdens

We've implemented a scene detection framework that segments videos into logical story units without the need for a human editor. We achieve this in two steps: 1) we train a deep-learning model to learn a distance measure (i.e. similarity measure) between all pairs of shots by leveraging visual, audio, and textual features extracted from the video; 2) we then cluster contiguous groups of shots into scenes based on the full similarity matrix of shots.

Presented by Rik Heijdens at Demuxed 2017

Demuxed

October 05, 2017
Tweet

More Decks by Demuxed

Other Decks in Technology

Transcript

  1. Leveraging AI for Scene Detection in Video By learning a

    distance measure between shots By: Rik Heijdens
  2. Scene Detection The task of finding Logical Story Units in

    Video Why? • Automatic content indexing of (large) video libraries • Automatic advertisement insertion
  3. Extracting Audible Features • Audio is often used to underline

    the development of a story • Short-time Fourier Transforms (STFTs) • Mel-scaled power spectrograms S. Dieleman et al. "End-to-end learning for music audio" (ICASSP 2014)
  4. Feed the extracted features into a Neural Network 1. Concatenate

    all the features into a single dense feature vector. 2. Feed this feature vector into a multimodal neural network that learns how to weight the components and maps high dimensional feature vectors into lower dimensional shot embeddings.