- intuitive • easy to code by hand • easy to associate with other representations • easy to learn Inefficient for componential structure. Local representation
1024 dimensional visual features (Inception Network) • 128 dimensional audio features (VGG inspired network) • quantized (8 bit ber unit) Dataset size: 1.71 Tb Video level features • aggregated frame level features using averaging and simple statistics