Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Deep Learning & NLP - PyData London 2016

Introduction to Deep Learning & NLP - PyData London 2016

We take a crack at explaining the following topics:
1. What is deep learning?
2. Motivation: Some use cases where it has produced state-of-art results
3. Basic building blocks of Neural networks (Neuron, activation function, back propagation algorithm, gradient descent algorithm)
4. Supervised learning (multi-layer perceptron, recurrent neural network)
5. Introduction to word2vec
6. Introduction to Recurrent Neural Networks
7. Text classification using RNN
8. Impact of GPUs (Some practical thoughts on hardware and software)

unnati_xyz

May 06, 2016
Tweet

More Decks by unnati_xyz

Other Decks in Technology

Transcript

  1. AGENDA Motivation to Learning Introduction to Artificial Neural Networks Multi

    Layer Perceptron Deep Learning in Natural Language Processing Impact of GPU
  2. For each image, find “most similar” image. Guess that as

    the label. k Nearest-Neighbors Recognizing Digit – An algorithm Use functions that compute relevant information to solve the problem
  3. Recognizing Digit – An algorithm Hard coded rules are brittle.

    Mostly better than a coin-toss. But far from being effective and accurate
  4. Biological Inspiration Connected network of neurons Communicate by electrical and

    chemical signals ~ 1011 neurons ~1000 synapses per neuron
  5. How is information detected? How is it stored? How does

    it influence recognition? Questions about Learning
  6. Learnings from Neuro & Cognitive Science Kids talk grammatically correct

    sentences even before they are taught formal language. Kids learn after listening to a lot of sentences. See/hear/feel first. Assimilate. Build the context hierarchically. Recognize. Respond. Associations and Structural inferences. Understand context. Eg: Drinking water Vs River Vs Ocean
  7. Simplified Brain’s Visual System source: ASDM Summer School on Deep

    Learning 2014 Hierarchical Structure: Several layers of representation Each layer builds on top of the previous one to obtain more complex representations
  8. Importance of Connectionism Simple units interacting in a complex network

    Distributed representation of knowledge Parallelism Threshold mechanism for robust classification Mechanism for learning – adjusting synaptic weights Comprehend inner structure of observed data Lessons from Biological Learning
  9. Single neuron is not enough INPUT OUTPUT Individual elements are

    weak computational elements Need: Networked elements
  10. Feed-forward Neural Network source: https://en.wikipedia.org/wiki/Artificial_neural_network Input layer of source nodes

    One or several hidden layers of processing neurons Output layer of processing neuron(s) Connections only between adjacent layers There are no feedback connections The network is formed by
  11. Perceptron A A a source: ASDM Summer School on Deep

    Learning 2014 Invented in 1957 Classifies input data into one of the output classes. If the weighted input is more than the threshold, classify as 1. Else 0 Online learning possible
  12. Sigmoid/Logistic Output is bounded between 0 & 1 Domain: Complete

    set of Real numbers Smooth and continuous function Symmetric Derivative can be quickly calculated Positive, Bounded, strictly positive
  13. Rectified Linear Units Cheap to compute (no exponentials) Faster training

    Sparser networks Bounded below 0 Strictly increasing
  14. source: ASDM Summer School on Deep Learning 2014 Max-out Generalization

    of Rectified Linear Max of k linear functions -> piece-wise linear At large k, can approximate a nonlinear function
  15. Backpropagation Algorithm Recursively and iteratively, the weight of the neuron

    that contributed most to the error gets penalized the most
  16. Backpropagation Algorithm Computes gradient of the loss function w.r. to

    the weights. Backpropagate training error to generate deltas of all the neurons from hidden layers to output layer Use gradient descent to update weights
  17. Goal: To find minimum of the loss function (minimize error

    of the model) Learning in ANN – Gradient Descent
  18. Learning in ANN – Gradient Descent Depending on where we

    start, we can end up in different places.
  19. Some ways to address Overfitting Weight Decay L1/L2 Regularization Suitable

    Model Architectures Unsupervised Pre-training Dropout Data Augmentation
  20. Dropout Cripple the network by removing hidden units stochastically In

    practice, probability of 0.5 works well. BEFORE DROPOUT source:http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html
  21. Dropout Cripple the network by removing hidden units stochastically In

    practice, probability of 0.5 works well. BEFORE DROPOUT source:http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html AFTER DROPOUT
  22. Structure important for humor & sarcasm Structure is very important

    A small crowd quietly enters the historic church A Historic crowd enters the small quietly church
  23. Limitation of the architectures so far Fixed Size Input (Eg:

    Image) Fixed computational steps (Eg: Number of layers) Fixed Size Output (Eg: probabilities of different classes) Hierarchy captured. But context, structure ? – mostly NOT !
  24. Word2Vec Window based vectorization Learn surrounding words The model learns

    using the data instead of relying on any other kind of corpus Represents discrete state of the word Similar words are clustered together
  25. One-to-Many Sequence Sequence output (Recognize image and explain it in

    words) source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
  26. RNN

  27. RNN

  28. Effect of Dataset Size 1 thousand labeled examples perform 25-50%

    worse than linear model 1 Million labeled examples perform 0-30% better than linear model RNNs have poor generalization properties on small datasets RNNs have better generalization on large datasets.
  29. Impact of GPU Compared to CPUs 20x speedups are typical

    Source: http://www.nvidia.com/object/what-is-gpu-computing.html GPUs have thousands of cores to process parallel workloads effectively
  30. Impact of GPU Accelerated computations on float32 data Matrix multiplication,

    convolution, and large element-wise operations can be accelerated a lot Difficult to parallelize dense neural networks on multiple GPU efficiently (Active area of research) Copying of large quantities of data to and from a device is relatively slow CUDA has released cuDNN