Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Deep Learning

Mike Wu
October 31, 2015

Introduction to Deep Learning

Mike Wu

October 31, 2015
Tweet

More Decks by Mike Wu

Other Decks in Technology

Transcript

  1. WORKSHOP OVERVIEW • WHY DEEP LEARNING • HOW IT WORKS

    • CONVNETS • WORKSHOP DEMO • FUTURE DIRECTION
  2. MACHINE LEARNING VS DEEP LEARNING (1) MACHINE LEARNING IS ABOUT

    EXAMPLES. DEEP LEARNING IS ABOUT DATA. ▸ Machine learning lets see customize examples and features and models. ▸ Deep learning is about one model to rule them all. As long as you have data you can learn the parameters. But can you interpret?
  3. MACHINE LEARNING VS DEEP LEARNING (2) DEEP LEARNING IS ON

    THE RISE. ▸ Deep learning is growing more and more popular. ▸ Proven it’s worth in industry.
  4. NEURAL UNIT (1) BIOLOGICAL ORIGINS. ▸ The computational neural unit

    is inspired by biology! (kind of) ▸ Brain has 86 billion neurons and 10^15 synapses. ▸ Each neuron receives input signals from dendrites and produces output signals along its axon, which connects to other dendrites. ▸ Neurons can fire and whether it fires is based on an activation threshold. ▸ Inhibitory and excitatory neurons!
  5. NEURAL UNIT (2) BIOLOGY TO COMPUTATION. ▸ Super simplified version

    of a biological neuron. If even… ▸ Like a neural cell, a neural unit takes inputs and makes outputs. ▸ The output a unit produces is based on an activation function. It represents the “frequency” of firing (probability). ▸ The input signals interact multiplicatively to represent inhibition and excitatory. These are weights!
  6. NEURAL UNIT (3) HOW DO YOU MODEL THE ACTIVATION FUNCTION?

    ▸ Inputs and outputs are easy to model. ▸ Sigmoid : maps real number to 0 and 1. Easy to see how this can help determine the activation function. ▸ Goes from not firing (0) to fully-saturated firing at maximum frequency (1). ▸ In real life, people use ReLu or Hyperbolic Tangent because it’s more stable.
  7. NEURAL NETWORK (1) COMBINING NEURONS. ▸ If a single neuron

    is already so powerful, what can you do with a bunch of them? ▸ You can combine neurons together into a network, creating “hidden layers”.
  8. NEURAL NETWORK (3) MODULARITY IS SO IMPORTANT. ▸ Being to

    think of a neural networks as layers allows us to build them fundamentally like lego blocks. ▸ As long as we ensure that each layer is designed well, we can just stack layers in any way we want! Super powerful. ▸ To change things, you can switch out layers, add, remove.
  9. NEURAL NETWORK (2) LAYER WISE DESIGN. ▸ A single layer

    needs 3 things: ▸ A function to process input to output. ▸ A function to process derivative wrt output to derivative wrt input. ▸ A function to get gradients of parameters.
  10. BACKPROPAGATION (1) OPTIMIZATION. ▸ Inputs to outputs is intuitive. But

    why do we need derivatives? ▸ The goal of machine learning in general is to find optimal parameters using some model. ▸ Think of a best fit line! The slope is out parameter. And we minimize distance of points to the line.
  11. BACKPROPAGATION (2) COST FUNCTION. ▸ Need a way to measure

    how far we are from the maxima! ▸ This is what we want to take the derivative of.
  12. BACKPROPAGATION (3) GRADIENT DESCENT. ▸ In general, we can optimize

    by gradient descent to find global maxima/minima. ▸ We can think of 3d data as topological map. There are valleys and mountains. To find optima, we want to take small steps in the right direction. ▸ A gradient/derivative represents the direction and size of the step! ▸ In real life, use Adagrad, Momentum, Newton’s CG.
  13. BACKPROPAGATION (4) MOTIVATION FOR BACKPROPAGATION. ▸ Go back to that

    model. ▸ This is inefficient. We are calculating same values over and over and this is a model of 3 layers. Imagine 60 layers.
  14. BACKPROPAGATION (5) THIS WORKS FOR ARBITRARY MODELS. ▸ Super robust.

    Even if a complex model like a convolutional neural net, just do a forward pass to get outputs, then a backwards pass to get derivatives wrt to outputs, and use some of the layers to get derivatives wrt to parameters, and optimize.
  15. CONVOLUTIONAL NETWORK NETWORK (1) ▸ Revolutionized object and speech recognition.

    ▸ Can contain 1000s of layers and millions of parameters. ▸ Contains two new types of layers: convolution and pooling.
  16. CONVOLUTIONAL NETWORK NETWORK (2) ▸ Convolution is like sliding a

    filter over an image and creating a linear combination each time. ▸ Do this over an entire image and you get an image back. ▸ The parameters here are the filters themselves and the point of the model is to learn these filters.
  17. CONVOLUTIONAL NETWORK NETWORK (3) POOLING. ▸ Pooling is even easier.

    If I do a lot of filters then after a convolution layer, I end up with a lot of images. ▸ To be more memory efficient, we can just take the largest one from a single filter.
  18. TORCH TUTORIAL (1) THE TENSOR. ▸ A tensor is an

    N-dim vector. ▸ Like how vector calculations in python are faster than loops, everything in Torch is a tensor calculation. ▸ This is really useful for big data!
  19. TORCH TUTORIAL (2) NN IS TO TORCH AS NUMPY IS

    PYTHON. ▸ NN is a library that makes designing neural networks really really easy. You can define models, cost, and optimize with a few calls. ▸ It’s built with layers too. Like legos. INPUT OUTPUT LINEAR LINEAR NONLINEAR
  20. FUTURE DIRECTION (1) DEEP LEARNING IS NOT THE END. IT’S

    A STEP. ▸ As great as deep learning is, there are big and noticeable drawbacks. What can we do? ▸ Combining with Bayesian Learning. ▸ Automating Hyperparameters. ▸ Biological Models? ▸ More industry applications.