Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tensorflow for Janitors

Tensorflow for Janitors

soobrosa

March 04, 2022
Tweet

More Decks by soobrosa

Other Decks in Technology

Transcript

  1. Tensorflow for Janitors
    Cra$ Conference Budapest 2017
    Daniel Molnar @soobrosa
    door2door GmbH
    1

    View Slide

  2. Perspec've
    • rounded, not complete,
    • slow, old, stupid and lazy and
    • looking for feedback either to add or remove.
    2

    View Slide

  3. Where I'm coming from
    • head of data and analy-cs,
    • senior applied and data scien-st,
    • data analyst,
    • head of data,
    • or just data janitor.
    3

    View Slide

  4. Orienta(on
    What's this talk is about:
    - deep learning,
    - what a generalist can use Tensorflow for,
    - what can it teach us about a good product.
    What's this talk is not about:
    - ex%nc%on or salva%on by AI,
    - coding tutorial,
    - pitching a Google product.
    4

    View Slide

  5. Decyphering jargon via history
    5

    View Slide

  6. Adventures in CS
    (cca 1999)
    Machine learning is a func/on
    trained (un)supervised that
    generalizes well but hopefully
    not too much (overfi1ng) on a
    dataset ending up in fancy
    thesises.
    6

    View Slide

  7. Technical and simplified
    We:
    • run mul$variate linear regression
    • with cost/loss func$on to op,mize for
    (typically squared error)
    • with a batch gradient descent.
    7

    View Slide

  8. A neuron
    8

    View Slide

  9. Neural networks
    9

    View Slide

  10. Perceptron (1958)
    • random start weights,
    • ac*va*on func*on is a weighted sum exceeding a threshold.
    10

    View Slide

  11. Hidden layers before the AI winter ('70s)
    Mostly Minsky's fault:
    - non-linear failure (XOR),
    - backpropaga)on.
    11

    View Slide

  12. Backpropaga)on
    ('70-'80s)
    • ac$va$on func$on differen$able,
    • deriva$ve to adjust the weight to
    minimize error,
    • chain rule to blame prior layers,
    • op$mize with stochas.c gradient
    descent.
    12

    View Slide

  13. Deep Learning
    13

    View Slide

  14. What is it good for?
    • supervised
    near-human level accuracy in image
    classifica+on, voice recogni+on, natural
    language processing
    • unsupervised
    use large volumes of unstructured data
    to learn hierarchical models which
    capture the complex structure in the
    data and then use these models to
    predict proper+es of previously unseen
    data
    14

    View Slide

  15. 15

    View Slide

  16. So is this supercharged ML?
    Kinda yes:
    • large scale neural networks with many layers,
    • weighs can be n dimensional arrays (tensors),
    • high level way of defining predic0on code or forward pass,
    • framework figures the deriva16

    View Slide

  17. Who made it work? Blame Canada!
    According to Geoffrey Hinton in the past:
    - our labeled datasets were thousands of 9mes too small,
    - our computers were millions of 9mes too slow,
    - we ini1alized the weights in a stupid way,
    - we used the wrong type of non-linearity.
    17

    View Slide

  18. Datasets
    18

    View Slide

  19. Speed
    19

    View Slide

  20. GPUs to scale
    Training is highly parallelizable linear matrix algebra.
    20

    View Slide

  21. Weights
    21

    View Slide

  22. Training jargon
    • regulariza)on to avoid overfi,ng
    (dataset augmenta)on, early stopping,
    dropout layer, weight penalty L1 and
    L2),
    • proper learning rate decay
    (both high and low can be bad,
    proper rate decay),
    • batch normaliza)on
    (faster learning and higher overall
    accuracy).
    22

    View Slide

  23. Non-linearity
    23

    View Slide

  24. Ac#va#on func#ons
    • sigmoid
    1/(1+e^-x)
    • TanH
    (2/(1+e^-2x))-1
    • ReLU (rec'fied linear unit)
    max(0,x)
    • so/plus
    ln(1+e^x)
    24

    View Slide

  25. ReLU for president
    ReLU
    • is sparse and gives more robust representa2ons,
    • has best performance,
    • avoids vanishing gradient problem,
    • actually it's so#max so it's differen2able.
    25

    View Slide

  26. Basic architectures
    26

    View Slide

  27. Convolu'onal (CNN)
    • tradi'onal CV was hand-cra3ing,
    • mimics visual percep'on,
    • convolu'on extracts features1,
    • with lots of matrix mul'plica'on,
    • subsampling/pooling to reduce size
    and avoid overfi@ng.
    1 LeNet5, Yann LeCun, 1988
    27

    View Slide

  28. Recurrent (RNN)
    • stateful,
    • TDNN
    -me delay neural networks,
    • LSTM
    long short-term memory,
    • supervised.
    28

    View Slide

  29. Autoencoder
    • reinforcement learning
    (deliver ac2on on context)
    • DBN
    Deep Belief Networks - directed
    • DBM
    Deep Boltzmann Machines -
    undirected
    • unsupervised.
    29

    View Slide

  30. Tensorflow
    30

    View Slide

  31. Recent major contestants
    • 2002 Torch (Lua) industrial, mul3ple GPUs, acyclic comp. graphs
    • 2010 Caffe (Python) academic, boilerplate-heavy
    • 2010 Theano (Python) academic, high level lightweight Keras
    • 2011 DistBelief (Google)
    • 2015 Tensorflow (Python)
    • 2016 CNTK (C#)
    31

    View Slide

  32. TF is 18 months old
    • pla%orms: DSP, CPU (ARM, Intel), (mul+ple) GPU(s), TPU,
    • Linux, OSX, Windows, Android, iOS, Raspberry Pi,
    • Python, Go, Rust, Java and Haskell,
    • performance improvements.
    32

    View Slide

  33. Liason with Python
    • API stability,
    • resemble NumPy more closely,
    • pip packages are now PyPI compliant,
    • high-level API includes a new *.keras module
    (almost halve the boilerplate),
    • Sonnet, a new high level API from DeepMind.
    33

    View Slide

  34. TF is open source for 18 months
    • the most popular machine learning project on GitHub in 2016,
    • 16.644 commits by 789 people,
    • 10,031 related repos.
    34

    View Slide

  35. Tooling
    • TensorBoard visualize network topology and performance,
    • Embedding Projector high level model understanding via
    visualiza:on,
    • XLA domain-specific compiler for TF graphs (CPUs and GPUs),
    • Fold for dynamic batching,
    • TensorFlow Serving to serve TF models in produc:on,
    35

    View Slide

  36. TF product choices: Tesla, not Ford
    • the right language,
    • mul/ple GPUs for training efficiency,
    • compile /mes are great (no to config),
    • high level API,
    • enable community,
    • tooling.
    36

    View Slide

  37. OS models
    Dozens of pretrained models like:
    • Incep'on (CNN),
    • SyntaxNet parser (LSTM),
    • Parsey McParseface for English (LSTM),
    • Parsey's Cousins for 40 addi'onal languages (LSTM).
    37

    View Slide

  38. Examples
    CNN (percep)on, image recogni)on)
    recycling and cucumber sor1ng with RasPI
    preven1ng skin cancer and blindness in diabe1cs
    LSTM (transla)on, speech recogni)on)
    language transla1on
    RNN (genera)on, )me series analysis)
    text, image and doodle genera1on in style or from text
    Reinforcement learning (control and play, autonomous driving)
    OpenAI Lab
    38

    View Slide

  39. A good product
    • don't lead the pack,
    • well stolen is half done,
    • end-to-end,
    • ecosystem (tooling),
    • eat your own dogfood.
    39

    View Slide

  40. Distributed deep learning
    Past: centralized, shovel all to the same pit,
    do magic, command and control.
    Future: pretrain models centrally, distribute models,
    retrain locally, merge and manage models (Squeezenet 500 kb).
    Gain:
    - efficiency,
    - no big data pipes,
    - privacy.
    40

    View Slide

  41. Federated Learning (3 weeks ago)
    Phones collabora-vely learn a shared predic-on model
    • device downloads current model,
    • improves it by learning from local data (retrain),
    • summarizes changes of model as small focused update,
    • update, but no data, is sent to the cloud encrypted,
    • averaged with other user updates to improve the shared model.
    41

    View Slide

  42. Subject ma+er experts - deep learning novices
    • Do you really need it?
    • Prepare data (small data < transfer learning + domain adapta9on,
    cover problem space, balance classes, lower dimensionality).
    • Find analogy (CNN, RNN/LSTM/GRU, RL).
    • Create a simple, small & easy baseline model, visualize & debug.
    • Fine-tune (evalua9on metrics - test data, loss func9on - training).
    (Smith: Best Prac0ces for Applying Deep Learning to Novel ... , 2017)
    42

    View Slide

  43. Training
    • hosted (GCMLE, Rescale, Floydub, Indico, Bi• rented GPU (AWS -- TFAMI, AWS Deep Learning AMI for
    Ubuntu),
    • local (OSX :sigh:),
    • own GPU.
    43

    View Slide

  44. Near future?
    • bots go bust,
    • deep learning goes commodity,
    • AI is cleantech 2.0 for VCs,
    • MLaaS dies a second death,
    • full stack verCcal AI startups actually work.
    (Cross: Five AI Startup Predic6ons for 2017)
    44

    View Slide

  45. Major sources and read more
    • Andrey Kurenkov: A 'Brief' History of Neural Nets and Deep
    Learning, Part 1-4
    • Adam Geitgey: Machine Learning is Fun!
    • TensorFlow and Deep Learning – Without a PhD (1 and 3 hour
    version)
    • Pete Warden: Tensorflow for Poets
    45

    View Slide

  46. Thank you!
    @soobrosa
    46

    View Slide