Tensorflow for Janitors
Cra$ Conference Budapest 2017
Daniel Molnar @soobrosa
door2door GmbH
1
Slide 2
Slide 2 text
Perspec've
• rounded, not complete,
• slow, old, stupid and lazy and
• looking for feedback either to add or remove.
2
Slide 3
Slide 3 text
Where I'm coming from
• head of data and analy-cs,
• senior applied and data scien-st,
• data analyst,
• head of data,
• or just data janitor.
3
Slide 4
Slide 4 text
Orienta(on
What's this talk is about:
- deep learning,
- what a generalist can use Tensorflow for,
- what can it teach us about a good product.
What's this talk is not about:
- ex%nc%on or salva%on by AI,
- coding tutorial,
- pitching a Google product.
4
Slide 5
Slide 5 text
Decyphering jargon via history
5
Slide 6
Slide 6 text
Adventures in CS
(cca 1999)
Machine learning is a func/on
trained (un)supervised that
generalizes well but hopefully
not too much (overfi1ng) on a
dataset ending up in fancy
thesises.
6
Slide 7
Slide 7 text
Technical and simplified
We:
• run mul$variate linear regression
• with cost/loss func$on to op,mize for
(typically squared error)
• with a batch gradient descent.
7
Slide 8
Slide 8 text
A neuron
8
Slide 9
Slide 9 text
Neural networks
9
Slide 10
Slide 10 text
Perceptron (1958)
• random start weights,
• ac*va*on func*on is a weighted sum exceeding a threshold.
10
Slide 11
Slide 11 text
Hidden layers before the AI winter ('70s)
Mostly Minsky's fault:
- non-linear failure (XOR),
- backpropaga)on.
11
Slide 12
Slide 12 text
Backpropaga)on
('70-'80s)
• ac$va$on func$on differen$able,
• deriva$ve to adjust the weight to
minimize error,
• chain rule to blame prior layers,
• op$mize with stochas.c gradient
descent.
12
Slide 13
Slide 13 text
Deep Learning
13
Slide 14
Slide 14 text
What is it good for?
• supervised
near-human level accuracy in image
classifica+on, voice recogni+on, natural
language processing
• unsupervised
use large volumes of unstructured data
to learn hierarchical models which
capture the complex structure in the
data and then use these models to
predict proper+es of previously unseen
data
14
Slide 15
Slide 15 text
15
Slide 16
Slide 16 text
So is this supercharged ML?
Kinda yes:
• large scale neural networks with many layers,
• weighs can be n dimensional arrays (tensors),
• high level way of defining predic0on code or forward pass,
• framework figures the deriva
Slide 17
Slide 17 text
Who made it work? Blame Canada!
According to Geoffrey Hinton in the past:
- our labeled datasets were thousands of 9mes too small,
- our computers were millions of 9mes too slow,
- we ini1alized the weights in a stupid way,
- we used the wrong type of non-linearity.
17
Slide 18
Slide 18 text
Datasets
18
Slide 19
Slide 19 text
Speed
19
Slide 20
Slide 20 text
GPUs to scale
Training is highly parallelizable linear matrix algebra.
20
Slide 21
Slide 21 text
Weights
21
Slide 22
Slide 22 text
Training jargon
• regulariza)on to avoid overfi,ng
(dataset augmenta)on, early stopping,
dropout layer, weight penalty L1 and
L2),
• proper learning rate decay
(both high and low can be bad,
proper rate decay),
• batch normaliza)on
(faster learning and higher overall
accuracy).
22
ReLU for president
ReLU
• is sparse and gives more robust representa2ons,
• has best performance,
• avoids vanishing gradient problem,
• actually it's so#max so it's differen2able.
25
Slide 26
Slide 26 text
Basic architectures
26
Slide 27
Slide 27 text
Convolu'onal (CNN)
• tradi'onal CV was hand-cra3ing,
• mimics visual percep'on,
• convolu'on extracts features1,
• with lots of matrix mul'plica'on,
• subsampling/pooling to reduce size
and avoid overfi@ng.
1 LeNet5, Yann LeCun, 1988
27
TF is 18 months old
• pla%orms: DSP, CPU (ARM, Intel), (mul+ple) GPU(s), TPU,
• Linux, OSX, Windows, Android, iOS, Raspberry Pi,
• Python, Go, Rust, Java and Haskell,
• performance improvements.
32
Slide 33
Slide 33 text
Liason with Python
• API stability,
• resemble NumPy more closely,
• pip packages are now PyPI compliant,
• high-level API includes a new *.keras module
(almost halve the boilerplate),
• Sonnet, a new high level API from DeepMind.
33
Slide 34
Slide 34 text
TF is open source for 18 months
• the most popular machine learning project on GitHub in 2016,
• 16.644 commits by 789 people,
• 10,031 related repos.
34
Slide 35
Slide 35 text
Tooling
• TensorBoard visualize network topology and performance,
• Embedding Projector high level model understanding via
visualiza:on,
• XLA domain-specific compiler for TF graphs (CPUs and GPUs),
• Fold for dynamic batching,
• TensorFlow Serving to serve TF models in produc:on,
35
Slide 36
Slide 36 text
TF product choices: Tesla, not Ford
• the right language,
• mul/ple GPUs for training efficiency,
• compile /mes are great (no to config),
• high level API,
• enable community,
• tooling.
36
Slide 37
Slide 37 text
OS models
Dozens of pretrained models like:
• Incep'on (CNN),
• SyntaxNet parser (LSTM),
• Parsey McParseface for English (LSTM),
• Parsey's Cousins for 40 addi'onal languages (LSTM).
37
Slide 38
Slide 38 text
Examples
CNN (percep)on, image recogni)on)
recycling and cucumber sor1ng with RasPI
preven1ng skin cancer and blindness in diabe1cs
LSTM (transla)on, speech recogni)on)
language transla1on
RNN (genera)on, )me series analysis)
text, image and doodle genera1on in style or from text
Reinforcement learning (control and play, autonomous driving)
OpenAI Lab
38
Slide 39
Slide 39 text
A good product
• don't lead the pack,
• well stolen is half done,
• end-to-end,
• ecosystem (tooling),
• eat your own dogfood.
39
Slide 40
Slide 40 text
Distributed deep learning
Past: centralized, shovel all to the same pit,
do magic, command and control.
Future: pretrain models centrally, distribute models,
retrain locally, merge and manage models (Squeezenet 500 kb).
Gain:
- efficiency,
- no big data pipes,
- privacy.
40
Slide 41
Slide 41 text
Federated Learning (3 weeks ago)
Phones collabora-vely learn a shared predic-on model
• device downloads current model,
• improves it by learning from local data (retrain),
• summarizes changes of model as small focused update,
• update, but no data, is sent to the cloud encrypted,
• averaged with other user updates to improve the shared model.
41
Slide 42
Slide 42 text
Subject ma+er experts - deep learning novices
• Do you really need it?
• Prepare data (small data < transfer learning + domain adapta9on,
cover problem space, balance classes, lower dimensionality).
• Find analogy (CNN, RNN/LSTM/GRU, RL).
• Create a simple, small & easy baseline model, visualize & debug.
• Fine-tune (evalua9on metrics - test data, loss func9on - training).
(Smith: Best Prac0ces for Applying Deep Learning to Novel ... , 2017)
42
Slide 43
Slide 43 text
Training
• hosted (GCMLE, Rescale, Floydub, Indico, Bi
Slide 44
Slide 44 text
Near future?
• bots go bust,
• deep learning goes commodity,
• AI is cleantech 2.0 for VCs,
• MLaaS dies a second death,
• full stack verCcal AI startups actually work.
(Cross: Five AI Startup Predic6ons for 2017)
44
Slide 45
Slide 45 text
Major sources and read more
• Andrey Kurenkov: A 'Brief' History of Neural Nets and Deep
Learning, Part 1-4
• Adam Geitgey: Machine Learning is Fun!
• TensorFlow and Deep Learning – Without a PhD (1 and 3 hour
version)
• Pete Warden: Tensorflow for Poets
45