Leveraging AI for Scene Detection in Broadcast ...

October 05, 2017

230

Leveraging AI for Scene Detection in Broadcast Video - Rik Heijdens

We've implemented a scene detection framework that segments videos into logical story units without the need for a human editor. We achieve this in two steps: 1) we train a deep-learning model to learn a distance measure (i.e. similarity measure) between all pairs of shots by leveraging visual, audio, and textual features extracted from the video; 2) we then cluster contiguous groups of shots into scenes based on the full similarity matrix of shots.

Presented by Rik Heijdens at Demuxed 2017

Demuxed

October 05, 2017

Tweet

More Decks by Demuxed

See All by Demuxed

Going Beyond Bitrate: Adaptation Based on Video Context - Reinhard Grandl

0

460

Why do broadcasters suddenly care about IP - Kieran Kunhya

0

750

Things Developers Believe About Video Files (Proven Wrong by User Uploads) - Derek Buitenhuis

0

12k

Ambisonic Audio Using the WebAudio API - Mario Guggenberger

0

450

HEVC Patent Pools: Where do we stand? - Hector Ribera

0

420

Other Decks in Technology

See All in Technology

形式手法特論：位相空間としての並行プログラミング #kernelvm / Kernel VM Study Tokyo 18th

3

1.3k

Google Agentspaceを実際に導入した効果と今後の展望

PRO

3

400

猫でもわかるQ_CLI(CDK開発編)+ちょっとだけKiro

0

3.4k

Jamf Connect ZTNAとMDMで実現! 金融ベンチャーにおける「デバイストラスト」実例と軌跡 / Kyash Device Trust

1

190

2025-07-31: GitHub Copilot Agent mode at Vibe Coding Cafe (15min)

2

400

風が吹けばWHOISが使えなくなる～なぜWHOIS・RDAPはサーバー証明書のメール認証に使えなくなったのか～

orangemorishita

15

5.7k

20250807_Kiroと私の反省会

0

200

全員が手を動かす組織へ - 生成AIが変えるTVerの開発現場 / everyone-codes-genai-transforms-tver-development

0

110

家族の思い出を形にする〜１秒動画の生成を支えるインフラアーキテクチャ

3

930

オブザーバビリティプラットフォーム開発におけるオブザーバビリティとの向き合い / Hatena Engineer Seminar #34 オブザーバビリティの実現と運用編

0

370

JAWS AI/ML #30 AI コーディング IDE "Kiro" を触ってみよう

3

350

Intro to Software Startups: Spring 2025

0

240

Featured

See All Featured

Exploring the Power of Turbo Streams & Action Cable | RailsConf2023

34

6k

Practical Tips for Bootstrapping Information Extraction Pipelines

PRO

22

1.4k

Testing 201, or: Great Expectations

45

7.6k

Navigating Team Friction

188

15k

GitHub's CSS Performance

1031

460k

How to Ace a Technical Interview

278

23k

Into the Great Unknown - MozCon

40

2k

Code Review Best Practice

69

19k

How To Stay Up To Date on Web Technology

790

250k

Designing for Performance

610

69k

What's in a price? How to price your products and services

246

12k

Cheating the UX When There Is Nothing More to Optimize - PixelPioneers

stephaniewalter

283

13k

Transcript

Leveraging AI for Scene Detection in Video By learning a
distance measure between shots By: Rik Heijdens
Video Structure
Scene Detection The task of finding Logical Story Units in
Video Why? • Automatic content indexing of (large) video libraries • Automatic advertisement insertion
Architecture Overview
Feature extraction
Extracting Visual Features • Encode frames using Google's Inception CNN
Extracting Audible Features • Audio is often used to underline
the development of a story • Short-time Fourier Transforms (STFTs) • Mel-scaled power spectrograms S. Dieleman et al. "End-to-end learning for music audio" (ICASSP 2014)
Extracting Textual Features Extracted from transcripts Word2Vec embeddings
Feed the extracted features into a Neural Network 1. Concatenate
all the features into a single dense feature vector. 2. Feed this feature vector into a multimodal neural network that learns how to weight the components and maps high dimensional feature vectors into lower dimensional shot embeddings.
Clustering Similarity Matrix
Plot of similarity scores for temporally adjacent shots Clustering
Clustering Scene Breaks and their confidence score plotted on top
of the Similarity Matrix
Questions? Approach me after the talk or send an email
to [email protected]