machine-learning-camp-2018

Piotr Wawryka Distributed Representation

Agenda • What it is distributed representation? • Words, Paragraphs
and Documents • Images • Artistic Style Transfer • Videos - Youtube8M

Each concept represented by single unit. • easy to understand
- intuitive • easy to code by hand • easy to associate with other representations • easy to learn Inefficient for componential structure. Local representation

Feature based representation Each concept represented by many units. Each
unit represents many concept.

Local representation - new entity

Feature based representation - new entity

Feature based representation - relations

Distributed representation • real coordinate space • no clear interpretation
of specific feature • representation length can be smaller

How to build distributed representation?

Traditional approach: • one-hot encoding • every word is orthogonal
one to another W woman ∙ W man = 0 Neural language modeling: • deep architectures • recurrent networks • simple models Words representation

Word2Vec • proposed by Tomas Mikolov • 2 log-linear models
• distributed words representation

Sliding window center - the context - development, of, Hellenic,
language

CBoW - Continuous Bag of Words

CBoW - details input layer • context - one-hot encoding
projection layer • linear activation • shared weights - embedding lookup output layer • softmax

Skip-Gram

Skip-Gram - details input layer • center word - one-hot
encoding projection layer • linear activation • embedding lookup output layer • softmax activation - shared weights

Word2Vec - optimizations Softmax - model conditional probability over vocabulary.
Vocabulary can be very large! Optimizations: • Hierarchical Softmax • Noise Contrastive Estimation (NCE) • Negative Sampling

Word2Vec - expectations

Word2Vec - results

Word2Vec - basic building block • spam filtering • text
classification • sentiment analysis • text search Training takes minutes or hours instead of days or weeks

Paragraph Vector • proposed by Quoc Le and Tomas Mikolov
• models inspired by Word2Vec • distributed representation of sentences, paragraphs and documents

Paragraph Vector - Distributed Memory Source: Distributed Representations of Sentences
and Documents, https://arxiv.org/pdf/1405.4053v2.pdf

Paragraph Vector - Distributed Bag of Words Source: Distributed Representations
of Sentences and Documents, https://arxiv.org/pdf/1405.4053v2.pdf

Paragraph Vector - inference • fix projection layer word weights
(W) • fix softmax weights • update paragraph representation (D) Source: Distributed Representations of Sentences and Documents, https://arxiv.org/pdf/1405.4053v2.pdf

Image recognition Source: https://www.bdti.com/InsideDSP/2017/06/29/Microsoft

Convolutional Neural Network

Convolutions

Pooling

ReLU - Rectified Linear Unit • simple non-linear activation •
fast to compute • fast to compute derivative • solves vanishing gradient problem • dead neurons

Leaky ReLU

Feature extraction

Distributed representation - example

Modern architectures Source: http://knowyourmeme.com/memes/we-need-to-go-deeper

GoogLeNet - Inception Network Source: https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/

Inception module Source: https://hacktilldawn.com/2016/09/25/inception-modules-explained-and-implemented/

Degradation problem Source: Deep Residual Learning for Image Recognition, https://arxiv.org/pdf/1512.03385v1.pdf

ResNet Source: Deep Residual Learning for Image Recognition, https://arxiv.org/pdf/1512.03385v1.pdf

Residual block Source: Deep Residual Learning for Image Recognition, https://arxiv.org/pdf/1512.03385v1.pdf
learn F(x) and provide identity (x) with shortcut connection

plain network vs ResNet Source: Deep Residual Learning for Image
Recognition, https://arxiv.org/pdf/1512.03385v1.pdf

Artistic Style Transfer

Artistic Style Transfer Source: A Neural Algorithm of Artistic Style,
2015, Gatys et al.

Textures Source: Texture Synthesis Using Convolutional Neural Networks, 2015, Gatys
et al.

Style transfer Source: A Neural Algorithm of Artistic Style, 2015,
Gatys et al.

Youtube8M Source: https://research.google.com/youtube8m/ Large scale video dataset announced by Google
in 2016: • 7+ million videos • 4700+ different labels • 450,000 hours of video How it was possible to distribute such large dataset?

Frames Source: http://www.mediacollege.com/video/frame-rate/ To understand video it is not required
to have information about all frames. Sample frames from video. Distributed representation of each frame instead of raw image.

Dataset Frame level features • 1 frame per second •
1024 dimensional visual features (Inception Network) • 128 dimensional audio features (VGG inspired network) • quantized (8 bit ber unit) Dataset size: 1.71 Tb Video level features • aggregated frame level features using averaging and simple statistics

Summary

Thank you for your time!

machine-learning-camp-2018

machine-learning-camp-2018

Other Decks in Education

Featured

Transcript