Introduction to VP8

Introduction to VP8 郭⾄至軒 (KuoE0) [email protected]

Latest update: Nov 19, 2013, Attribution-ShareAlike 3.0 Unported (CC BY-SA
3.0) http://creativecommons.org/licenses/by-sa/3.0/

Situation

Video Codec VP8

An Open Source Codec

Developed by On2 Technology

Developed by On2 Technology February, 2010

Acquired by Google February, 2010

Patent

March, 2013 web m

Royalty-Free Terms March, 2013 web m

Successor VP9

Successor VP9 May 15, 2013

Feature

focus on Internet web-based application

Low Bandwidth Requirement Image Quality: watchable (PSNR: ~30dB) visually lossless
(PSNR: ~45dB)

Heterogeneous Client Hardware

Heterogeneous Client Hardware Efﬁcient Implementations

Web Video Format YUV 420 color sampling 8 bit per
channel depth Up to 16383 × 16383 pixels

Processing Flow

Coding Predict Transform + Quantize Entropy Code Loop Filter

Decoding Entropy Decode Predict Dequantize+Inverse Transform Loop Filter

Reference Frame

Golden Frame Last Frame Alternate Frame Reference Frame

Golden Frame Last Frame Alternate Frame At most 3 reference
frames in VP8.

Last Frame

Last Frame Current Frame

Golden Frame Choose an arbitrary frame in the past. Deﬁne
a number of ﬂags to notify decoder when and how to update this buffer.

set as the golden frame

Golden Frame Golden Frame

Reconstruct moving object background

Alternate Frame Other Frame Alternate Frame

Alternate Frame Other Frame Alternate Frame decode show

Alternate Frame Other Frame Alternate Frame decode show decode show

Alternate Frame Other Frame Alternate Frame decode show decode show
store beneﬁcial information

Construct from multi-frame

Construct from multi-frame Alternate Frame

Typical Frame I B B P B B P B
B I B B P

VP8 L G A G G G G G L
G G G A G L

Prediction

Intra Prediction Inter Prediction use data within a single video
frame use data from previously encoded frames

Intra Prediction Luma Luma Chroma

Intra Prediction Luma Luma Chroma 16 4 8

H_PRED (horizontal prediction) V_PRED (vertical prediction) DC_PRED (DC prediction) TM_PRED
(TrueMotion prediction) Four Prediction Modes:

Horizontal Prediction Fills each column of the block with a
copy of the left column. a b c d e f g h i j k l m n o p q r s t u v w x y A B C D E F G H I J K L M N O P Q R S T U V W X Y

copy of the left column. a b c d e f g h i j k l m n o p q r s t u v w x y A B C D E F G H I J K L M N O P Q R S T U V W X Y e j o t y

copy of the left column. a b c d e f g h i j k l m n o p q r s t u v w x y A B C D E F G H I J K L M N O P Q R S T U V W X Y e j o t y e e e e e j j j j j o o o o o t t t t t y y y y y

Vertical Prediction Fills each row of the block with a
copy of the above row. a b c d e f g h i j k l m n o p q r s t u v w x y A B C D E F G H I J K L M N O P Q R S T U V W X Y

copy of the above row. a b c d e f g h i j k l m n o p q r s t u v w x y A B C D E F G H I J K L M N O P Q R S T U V W X Y U V W X Y

copy of the above row. a b c d e f g h i j k l m n o p q r s t u v w x y A B C D E F G H I J K L M N O P Q R S T U V W X Y U V W X Y U V W X Y U V W X Y U V W X Y U V W X Y U V W X Y

DC Prediction Fills the block with a single value using
the average of the pixels in the above row and the left column. a b c d e f g h i j k l m n o p q r s t u v w x y A B C D E F G H I J K L M N O P Q R S T U V W X Y

the average of the pixels in the above row and the left column. a b c d e f g h i j k l m n o p q r s t u v w x y A B C D E F G H I J K L M N O P Q R S T U V W X Y U V W X Y e j o t y Z = (U + V + W + X + Y + e + j + o + t + y) ÷ 10

the average of the pixels in the above row and the left column. a b c d e f g h i j k l m n o p q r s t u v w x y A B C D E F G H I J K L M N O P Q R S T U V W X Y U V W X Y e j o t y Z = (U + V + W + X + Y + e + j + o + t + y) ÷ 10 Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

* * * * L0 * * * * L1
* * * * L2 * * * * L3 * * * * L4 * * * * * * * * * * * * * * * * * * * * A0 A1 A2 A3 A4 TrueMotion Prediction Horizontal diﬀerences between pixels in above row and vertical diﬀerences between pixels in left column are propagated (starting from C). * * * * * * * * * * * * * * * * * * * * * * * * C

* * * * L0 * * * * L1
* * * * L2 * * * * L3 * * * * L4 * * * * * * * * * * * * * * * * * * * * A0 A1 A2 A3 A4 A0 A1 A2 A3 A4 L0 L1 L2 L3 L4 TrueMotion Prediction Horizontal diﬀerences between pixels in above row and vertical diﬀerences between pixels in left column are propagated (starting from C). * * * * * * * * * * * * * * * * * * * * * * * * C C Xij = Ai + Lj - C

* * * * L0 * * * * L1
* * * * L2 * * * * L3 * * * * L4 * * * * * * * * * * * * * * * * * * * * A0 A1 A2 A3 A4 A0 A1 A2 A3 A4 L0 L1 L2 L3 L4 TrueMotion Prediction Horizontal diﬀerences between pixels in above row and vertical diﬀerences between pixels in left column are propagated (starting from C). * * * * * * * * * * * * * * * * * * * * * * * * C C Xij = Ai + Lj - C Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij Xij

Inter Prediction As mentioned above...

Inter Prediction Golden Frame Last Frame Alternate Frame

Motion Vector Reusing vectors from neighboring macroblocks. Flexible partitioning of
a macroblock into sub- blocks.

Sub-pixel Interpolation Quarter pixel accurate motion vectors for luma pixels.
High performance six-tap interpolation ﬁlters. [3, -16, 77, 77, -16, 3]/128 for 1⁄2 pixel positions [2, -11, 108, 36, -8, 1]/128 for 1⁄4 pixel positions [1, -8, 36, 108, -11, 2]/128 for 3⁄4 pixel positions

Hybrid Transform & Quantization

Divide into Macroblocks One 16×16 block of luma pixels (Y)
Two 8×8 blocks of chroma pixels (U, V) Typical Method

16 8 8

Divide into blocks VP8 Method All blocks of luma and
chroma are 4×4 blocks

Discrete Cosine Transform Fast implementation Slightly worse in energy compaction
than KLT Content-independency

Coding 2-D DCT Decoding 4×4 variant of LLM implementation

Coding 2-D DCT Decoding 4×4 variant of LLM implementation Practical
fast 1-D DCT algorithms with 11 multiplications

I1 I2 I3 I4 O1 O2 O3 O4 Inverse DCT
Graph in VP8 y0 y1 x0 x1 y0 = √2(x0×sin(π/8)-x1×cos(π/8)) y1 = √2(x0×cos(π/8)+x1×sin(π/8))

H.264/AVC use multiplication-less integer transform slightly better than Energy compaction
is

It is efﬁcient in processors with SIMD capability.

Walsh-Hadamard Transform Y = HXHT H = 1 1 1
1 1 1 -1 -1 1 -1 1 -1 1 -1 -1 1 [ ] HT is the transpose of H. Take advantage of the correlation to reduce redundancy.

Adaptive Quantization 128 quantization level. Different quantization level in single
frame. 1st order luma DC 1st order luma AC 2st order luma DC 2st order luma AC 2st order chroma DC 2st order chroma AC

Entropy Coding

Supports distribution updates on a per-frame basis Boolean arithmetic coder
Stable probability distributions within one frame Keyframes reset the probability values to the defaults

Adaptive Loop Filter

Removing blocking artifacts introduced by quantization and transformation.

Removing blocking artifacts introduced by quantization and transformation. Slight Filtering

Strong Filtering

Strong Filtering No Filtering

Parallel Processing

Data Partition Compressed Data

Data Partition Compressed Data marcoblock code mode & motion vector
transform coefﬁcients

More Transform Coefﬁcient Partition transform coefﬁcients support up to 8
token partitions

Compare to H.264

100 120 140 160 180 200 220 240 260 280
300 Night 720p 2000kbps Sheriff 720p 2000kbps Tulip 720p 2000kbps Deocding speed in Frame/second VP8 H.264 High Proﬁle Intel Core i7 3.2GHz

20 25 30 35 40 45 Night 720p 2000kbps Sheriff
720p 2000kbps Tulip 720p 2000kbps Deocding speed in Frame/second VP8 H.264 High Proﬁle Intel Atom N270 1.66GHz

Any Questions?

Thanks for your listening :)

Introduction to VP8

Introduction to VP8

More Decks by KuoE0

Other Decks in Technology

Featured

Transcript