Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Networks and Kernel Methods

matiskay
June 18, 2015

Deep Networks and Kernel Methods

matiskay

June 18, 2015
Tweet

More Decks by matiskay

Other Decks in Science

Transcript

  1. Deep Networks and Kernel Methods Edgar Marca [email protected] Grupo de

    Reconocimiento de Patrones e Inteligencia Artificial Aplicada — PUC Lima, Perú June 18th, 2015 1 / 35
  2. Table of Contents I Image Classification Problem Deep Networks Convolutional

    Neural Networks Software How to start Kernel Methods SVM The Kernel Trick History of Kernel Methods Software How to start Kernels and Deep Learning Convolutional Kernel Networks Deep Fried Convnets 2 / 35
  3. Deep Networks Human-level control through deep reinforcement learning ▶ Volodymyr

    Mnih et al., Human-level control through deep reinforcement learning. 5 / 35
  4. Deep Networks Software Software ▶ Torch7 — http://torch.ch/. ▶ Caffe

    — http://caffe.berkeleyvision.org/. ▶ Minerva — https://github.com/dmlc/minerva. ▶ Theano — http://deeplearning.net/software/theano/. 7 / 35
  5. Deep Networks How to start How to start I ▶

    Deep Learning Course by Nando de Freitas — https://www.youtube.com/watch?v=PlhFWT7vAEw&list= PLjK8ddCbDMphIMSXn-w1IjyYpHU3DaUYw. ▶ Alex Smola Lecture on Deep Networks — https://www.youtube.com/watch?v=xZzZb7wZ6eE. ▶ Convolutional Neural Networks for Visual Recognition — http://vision.stanford.edu/teaching/cs231n/. ▶ Deep Learning, Spring 2015 — http://cilvr.cs.nyu.edu/doku.php?id=courses: deeplearning2015:start. 8 / 35
  6. Deep Networks How to start How to start II ▶

    Deep Learning for Natural Language Processing — http://cs224d.stanford.edu/. ▶ Applied Deep Learning for Computer Vision with Torch – http://torch.ch/docs/cvpr15.html. ▶ DEEP LEARNING, An MIT Press book in preparation — http://www.iro.umontreal.ca/~bengioy/dlbook/. ▶ Reading List — http://deeplearning.net/reading-list/. 9 / 35
  7. Kernel Methods SVM Linear Support Vector Machine w, x +

    b = 1 w, x + b = −1 w, x + b = 0 margen Figure: Linear Support Vector Machine 11 / 35
  8. Kernel Methods SVM Linear Support Vector Machine Linear SVM -

    Primal Problem Given a linear separable training set D = {(x1, y1), (x2, y2), ..., (xl, yl)} ⊂ Rn × {+1, −1}, we can calculate the max margin decision surface ⟨w∗, x⟩ = b∗ solving the convex program (P)        min w,b ϕ(w, b) = 1 2 ⟨w, w⟩ subject to ⟨w, yixi⟩ ≥ 1 + yib, where (xi, yi) ∈ D ⊂ Rn × {−1, +1}. (1) 1. The objective function doesn’t depends on b. 2. The displacement b appears in the restrictions. 3. The number of restrictions is equal to the number of training points. 12 / 35
  9. Kernel Methods SVM Linear Support Vector Machine Lineal SVM -

    Dual Problem (DP)                    max α h(α) = max α ( l ∑ i=1 αi − 1 2 l ∑ i=1 l ∑ j=1 αiαjyiyj⟨xi, xj⟩ ) sujeto a l ∑ i=1 αiyi = 0, αi ≥ 0 for i = 1, . . . , l. The calculus of b es in terms of w∗, as following: b+ = min {⟨w∗, x⟩ | (x, y) ∈ D where y = +1)} b− = max {⟨w∗, x⟩ | (x, y) ∈ D where y = −1)} Then b∗ = b++b− 2 The training vectors associated to αi > 0 are named support vectors. 13 / 35
  10. Kernel Methods SVM Linear Support Vector Machine Linear Support Vector

    Machine ˆ f(x) = sign ( l ∑ i=1 α∗ i yi⟨xi, x⟩ − b∗ ) 14 / 35
  11. Kernel Methods The Kernel Trick Motivation ▶ How we can

    split data that is not linear separable? ▶ How we can utilize algorithms that works for linear separable data that only depends on the inner product? 16 / 35
  12. Kernel Methods The Kernel Trick R to R2 Case How

    to separate two classes? 0 R R2 ϕ(x) = (x, x 2) ϕ Figure: Separating the two classes of points by transforming the points into a higher dimensional space where the data is separable. 17 / 35
  13. Kernel Methods The Kernel Trick R2 to R3 Case +

    + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Figure: Data which is not linear separable. 18 / 35
  14. Kernel Methods The Kernel Trick R2 to R3 Case A

    simulation Figure: SVM with polynomial kernel visualization. 19 / 35
  15. Kernel Methods The Kernel Trick Idea ϕ ϕ(+) ϕ(+) ϕ(+)

    ϕ(−) ϕ(−) ϕ(−) ϕ(−) ϕ(−) ϕ(+) ϕ(+) Figure: φ is a non-linear mapping from the input space to the feature space. 20 / 35
  16. Kernel Methods The Kernel Trick Non Linear Support Vector Machine

    Now we can use a non-linear function φ to map the information from the initial space to a higher dimensional space. Non Linear Support Vector Machine ˆ f(x) = sign ( l ∑ i=1 α∗ i yi⟨φ(xi), φ(x)⟩ − b∗ ) 21 / 35
  17. Kernel Methods The Kernel Trick Definition 3.1 (Kernel) Let X

    a non-empty set. A function k : X × X → K is called kernel in X if and only if there is Hilbert Space H and a mapping Φ : X → H such that for all s, t it holds k(t, s) := ⟨Φ(t), Φ(s)⟩H (2) The function Φ is called feature mapping and H feature space of k. 22 / 35
  18. Kernel Methods The Kernel Trick Example 3.2 Consider X =

    R and the function k defined by k(s, t) = st = ⟨ [ s √ 2 s √ 2 ] , [ t √ 2 t √ 2 ]⟩ where the feature mappings are Φ(s) = s and ˜ Φ(s) = [ s √ 2 s √ 2 ] and the features spaces are H = R and ˜ H = R2 respectively. 23 / 35
  19. Kernel Methods The Kernel Trick Non Linear Support Vector Machines

    Using the kernel trick we can replace ⟨φ(xi), φ(x)⟩ by a kernel k(xi, x). ˆ f(x) = sign ( l ∑ i=1 α∗ i yik(xi, x) − b∗ ) 24 / 35
  20. Kernel Methods History of Kernel Methods Timeline Table: Timeline of

    Support Vector Machines Algorithm Development 1909 • Mercer Theorem — James Mercer. "Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations". 1950 • "Moore-Aronzajn Theorem" — Nachman Aronszajn. "Reproducing Kernel Hilbert Spaces". 1964 • Introduced the geometrical interpretation of the kernels as inner products in a feature space — Aizerman, Braverman and Rozonoer. "Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning". 1964 • Original SVM algorithm — Vladimir Vapnik and Alexey Chervonenkis. "A Note on One Class of Perceptrons" 26 / 35
  21. Kernel Methods History of Kernel Methods Timeline Table: Timeline of

    Support Vector Machines Algorithm Development 1965 • Cover’s Theorem — Thomas Cover. "Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition". 1992 • Support Vector Machines — Bernhard Boser, Isabelle Guyon and Vladimir Vapnik. "A Training Algorithm for Optimal Margin Classifiers". 1995 • Soft Support Vector Machines — Corinna Cortes and Vladimir Vapnik. "Support Vector Networks". 27 / 35
  22. Kernel Methods Software Software ▶ LibSVM — https://www.csie.ntu.edu.tw/~cjlin/libsvm/. ▶ SVMLight

    — http://svmlight.joachims.org/. ▶ Scikit Learn — http://scikit-learn.org/stable/modules/svm.html. 28 / 35
  23. Kernel Methods How to start How to start ▶ Introduction

    to Support Vector Machines — https://beta.oreilly.com/learning/intro-to-svm ▶ Lutz H. Hamel, Knowledge Discovery with Support Vector Machines. ▶ John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis. 29 / 35
  24. Kernels and Deep Learning Kernels and Deep Learning ▶ Julien

    Mairal et al., Convolutional Kernel Networks. ▶ Zichao Yang et al., Deep Fried Convnets. 31 / 35
  25. Kernels and Deep Learning Deep Fried Convnets Deep Fried Convnets

    ▶ Quoc Viet Le et al., Fastfood: Approximate Kernel Expansions in Loglinear Time. 33 / 35