Upgrade to Pro — share decks privately, control downloads, hide ads and more …

machine-learning-camp-2018

 machine-learning-camp-2018

Slides from Machine Learning Camp in Finland 2018

Piotr Wawryka

March 20, 2018
Tweet

Other Decks in Education

Transcript

  1. Agenda • What it is distributed representation? • Words, Paragraphs

    and Documents • Images • Artistic Style Transfer • Videos - Youtube8M
  2. Each concept represented by single unit. • easy to understand

    - intuitive • easy to code by hand • easy to associate with other representations • easy to learn Inefficient for componential structure. Local representation
  3. Distributed representation • real coordinate space • no clear interpretation

    of specific feature • representation length can be smaller
  4. Traditional approach: • one-hot encoding • every word is orthogonal

    one to another W woman ∙ W man = 0 Neural language modeling: • deep architectures • recurrent networks • simple models Words representation
  5. CBoW - details input layer • context - one-hot encoding

    projection layer • linear activation • shared weights - embedding lookup output layer • softmax
  6. Skip-Gram - details input layer • center word - one-hot

    encoding projection layer • linear activation • embedding lookup output layer • softmax activation - shared weights
  7. Word2Vec - optimizations Softmax - model conditional probability over vocabulary.

    Vocabulary can be very large! Optimizations: • Hierarchical Softmax • Noise Contrastive Estimation (NCE) • Negative Sampling
  8. Word2Vec - basic building block • spam filtering • text

    classification • sentiment analysis • text search Training takes minutes or hours instead of days or weeks
  9. Paragraph Vector • proposed by Quoc Le and Tomas Mikolov

    • models inspired by Word2Vec • distributed representation of sentences, paragraphs and documents
  10. Paragraph Vector - Distributed Bag of Words Source: Distributed Representations

    of Sentences and Documents, https://arxiv.org/pdf/1405.4053v2.pdf
  11. Paragraph Vector - inference • fix projection layer word weights

    (W) • fix softmax weights • update paragraph representation (D) Source: Distributed Representations of Sentences and Documents, https://arxiv.org/pdf/1405.4053v2.pdf
  12. ReLU - Rectified Linear Unit • simple non-linear activation •

    fast to compute • fast to compute derivative • solves vanishing gradient problem • dead neurons
  13. plain network vs ResNet Source: Deep Residual Learning for Image

    Recognition, https://arxiv.org/pdf/1512.03385v1.pdf
  14. Youtube8M Source: https://research.google.com/youtube8m/ Large scale video dataset announced by Google

    in 2016: • 7+ million videos • 4700+ different labels • 450,000 hours of video How it was possible to distribute such large dataset?
  15. Frames Source: http://www.mediacollege.com/video/frame-rate/ To understand video it is not required

    to have information about all frames. Sample frames from video. Distributed representation of each frame instead of raw image.
  16. Dataset Frame level features • 1 frame per second •

    1024 dimensional visual features (Inception Network) • 128 dimensional audio features (VGG inspired network) • quantized (8 bit ber unit) Dataset size: 1.71 Tb Video level features • aggregated frame level features using averaging and simple statistics