Recent Developments in Deep Learning

Recent developments in Deep Learning Olivier Grisel - Paris Datageeks
- Sept. 2015

Outline • Deep Learning quick recap • Recurrent Neural Networks
• Attention for Machine Translation • Attention and differentiable memory for reasoning

Deep Learning • Neural Networks from the 90’s rebranded in
2006+ • « Neuron » is a loose inspiration (not important) • Stacked layers of differentiable modules (matrix multiplication, convolution, pooling, element-wise non linear operations…) • Can be trained via gradient descent on large data pairs of input-output examples

Deep Learning in the 90’s sources: LeNet5 & Stanford Deep
Learning Tutorial

x = Input Vector h1 = Hidden Activations h2 =
Hidden Activations f1(x, w1) = max(conv(x, w1), 0) y = Output Vector f3(h2, w3) = softmax(dot(h2, w3)) f2(h1, w2) = max(dot(h1, w2), 0) w1 w2 f1 f2 f3 w3

Recent success • 2009: state of the art acoustic model
for speech recognition • 2011: state of the art road sign classiﬁcation • 2012: state of the art object classiﬁcation • 2013/14: end-to-end speech recognition, object detection • 2014/15: state of the art machine translation, getting closer for Natural Language Understanding in general

ImageNet Challenge ILSVRC2014 • 1.2 million images • 1000 classes
• Last winner: GoogLeNet now at less than 5% error rate • Used in Google Photos for indexing

Image captioning http://cs.stanford.edu/people/karpathy/deepimagesent/

Why now? • More labeled data • More compute power
(optimized BLAS and GPUs) • Improvements to algorithms

source: Alec Radford on RNNs

Recurrent Neural Networks

source: The Unreasonable Effectiveness of RNNs

Applications of RNNs • NLP (PoS, NER, Parsing, Sentiment Analysis)
• Generative Probabilistic Language Models • Machine Translation (e.g. English to French) • Speech recognition / Speech synthesis (newer) • Biological sequence modeling (DNA, Proteins)

Language modeling source: The Unreasonable Effectiveness of RNNs

Shakespeare

Wikipedia markup

Linux source code

Attentional architectures for Machine Translation

Neural MT source: From language modeling to machine translation

Attentional Neural MT source: From language modeling to machine translation

Attention == Alignment source: Neural MT by Jointly Learning to
Align and Translate

source: Neural MT by Jointly Learning to Align and Translate

source: Show, Attend and Tell

Differentiable memory for reasoning

Neural Turing Machines • Google DeepMind, October 2014 • Neural
Network coupled to external memory (tape) • Analogue to a Turing Machine but differentiable • Can be used to learn to simple programs from example input / output pairs • copy, repeat copy, associative recall, • binary n-grams counts and sort

NTM Architecture source: Neural Turing Machines • Turing Machine: controller
== FSM • Neural Turing Machine controller == RNN w/ LSTM

Differentiable Stack source: Inferring algorithmic patterns w Stack RNN

Stack RNN trained for binary addition source: Inferring algorithmic patterns
w Stack RNN

Continuous Stack source: Learning to Transduce with Unbounded Memory

Reasoning for QA

bAbI tasks https://research.facebook.com/researchers/ 1543934539189348

Memory Networks source: End to end memory networks

source: End to end memory networks

source: Dynamic Memory Networks

Neural Reasoner

source: Towards Neural Network-based Reasoning

QA on real data

Paraphrases from web news

source: Teaching Machines to Read and Comprehend

Conclusion • Deep Learning progress is fast paced • Many
applications already in production (e.g. speech, image indexing, face recognition) • Machine Learning is now moving from pattern recognition to higher level reasoning • Generic AI is no longer a swear-word among machine learners

Thank you! @ogrisel

Recent Developments in Deep Learning

Recent Developments in Deep Learning

More Decks by Olivier Grisel

Other Decks in Technology

Featured

Transcript