Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Easy Recipes to Make Apps using ML and Big Data

Lucy
September 15, 2017

Easy Recipes to Make Apps using ML and Big Data

Presented by Mahsa
Created by Lucy

Lucy

September 15, 2017
Tweet

More Decks by Lucy

Other Decks in Education

Transcript

  1. 1 Easy Recipes to Make Apps using Machine Learning and

    Big Data Mahsa Orang Presented for Hack the North September 15, 2017
  2. 2 Slides available on Slack channel #veeva Stop by our

    booth for questions and cool swag! We are hiring!
  3. 3 About Me Mahsa Orang Associate Software Engineer at Veeva

    Machine Learning App for Similarity Search and Record Linkage Information Technology PhD BSc MSc Mathematics Computer Science
  4. 5 Agenda Today’s Recipes ML development environment Make your data

    ready for ML Use ready-to-use DL models Pre-process your big data Get results faster A bonus app Recipe 1 Recipe 2 Recipe 3 Recipe 4 Recipe 5 Recipe 6
  5. 7 ML development environment Recipe 1 1 Make your data

    ready for ML Use ready-to-use DL models Pre-process your big data Get results faster A bonus app Recipe 2 Recipe 3 Recipe 4 Recipe 5 Recipe 6
  6. 8 Set Up Your Notebook Apache Zeppelin Noteboook Jupyter Notebook

    https://zeppelin.apache.org/download.html http://jupyter.org/install.html
  7. 13 ML development environment Recipe 1 Make your data ready

    for ML Use ready-to-use DL models Pre-process your big data Get results faster A bonus app Recipe 2 Recipe 3 Recipe 4 Recipe 5 Recipe 6 2
  8. 16 ML 101 Which one is the most similar to

    my pizza? My Pizza 5 9 7 4 Convert data to real numbers
  9. 17 ML 101 Which one is the most similar to

    my pizza? My Pizza 5 9 7 4 Then compare numbers 9-5=4 5-4=1 7-5=2 THIS ONE
  10. 18 ML 101 Which one is the most similar to

    my pizza? Extract more real numbers <6,1,...> <5,2,...>
  11. 19  ML 101 Which one is the most similar

    to my pizza? Calculate similarity between vectors <9,3,...> <5,2,...> <5,1,...> <6,1,...>
  12. 21 ML 101 Calculate similarity between vectors 5 10 15

    10 5 A B Weight Diameter Cosine Similarity
  13. 22  ML 101 Which one is the most similar

    to my pizza? Calculate similarity between vectors <5,2,...> <9,1,...> <5,1,...> <6,1,...> 0.2 0.8 0.5 database input data
  14. 23  ML 101 Which one is the most similar

    to my pizza? Calculate similarity between vectors <5,2,...> <9,1,...> <5,1,...> <6,1,...> 0.2 0.8 0.5 database input data
  15. 24 ML development environment Recipe 1 Make your data ready

    for ML Use ready-to-use DL models Pre-process your big data Get results faster A bonus app Recipe 2 Recipe 3 Recipe 4 Recipe 5 Recipe 6 3
  16. 26  ML 101 Calculate distance between vectors <5,2,..> <9,3,...>

    <5,1,...> <6,1,...> 0.04 0.8 0.02 database input data
  17. 30 ML/DL Pre-Trained Models 0 0.5 0 0.2 0.1 0

    Inception http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz
  18. 31 ML/DL Pre-Trained Models 0 0.5 0 0.2 0.1 0

    INCEPTION http://download.tensorflow.org/models/incep- tion_v4_2016_09_09.tar.gz WORD2VEC 0.9 0.5 0 0.7 0.1 1 https://drive.google.com/file/d/0B7XkCwpI5KDYNl- NUTTlSS21pQmM/edit?usp=sharing
  19. 33 ML development environment Recipe 1 Make your data ready

    for ML Use ready-to-use DL models Pre-process your big data Get results faster A bonus app Recipe 2 Recipe 3 Recipe 4 Recipe 5 Recipe 6 4
  20. 36  Pre-process Your Big Data big data input data

    0.9 0.5 0 0.7 0.1 1 0.3 0.1 0.3 0.2 0.1 0 0 0.4 0 0 0.1 1 1 0.2 0.5 0.1 0.1 1
  21. 37   Pre-process Your Big Data big data Loading

    Stage Vectorizing Stage    ML Worker ML Worker ML Worker
  22. 40 ML development environment Recipe 1 Make your data ready

    for ML Use ready-to-use DL models Pre-process your big data Get results faster A bonus app Recipe 2 Recipe 3 Recipe 4 Recipe 5 Recipe 6 5
  23. 41 How to Get Results Faster?  Input vector should

    be compared with all vectors! big data input data
  24. 47 Longitude: 43.473 Latitude: -80.542 Geohash: dpwxr1 A Working Example

    Longitude: 43.472 Latitude: -80.544 Geohash: dpwxr1 UW Math Coffee and Donuts
  25. 50 ML development environment Recipe 1 Make your data ready

    for ML Use ready-to-use DL models Process your Big Data offline Get results faster A bonus app Recipe 2 Recipe 3 Recipe 4 Recipe 5 Recipe 6 6
  26. 52 Deep Learning image (pixel) Input Layer Output Layer GEOFFREY

    HINTON Layers of Abstraction DEEP NEURAL NETWORK (DNN) object models edges object parts
  27. 53 Transfer Learning Input Layer Output Layer Layers of Abstraction

    PRE-TRAINED DL MODEL Replace last layers CLASSIFIER Hotdog or Not Hotdog
  28. 54 How to Build a Classifer in 5 min Not

    Hotdog App DL Model Classifier  input data output data Hotdog Not Hotdog or
  29. 55 Not Hotdog App DL Model Classifier  input data

    output data Hotdog Not Hotdog or How to Build a Classifer in 5 min
  30. 56 How to Build a Classifer in 5 min Not

    Hotdog App DL Model Classifier  input data output data Hotdog Not Hotdog or
  31. 58  0 0.4 0 0 0.1 1 How to

    Build a Classifer in 5 min 1 0.2 0.5 0.1 0.1 1 1. Find all the vectors of the images using DL model 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta
  32. 59  0 0.4 0 0 0.1 1 How to

    Build a Classifer in 5 min 1 0.2 0.5 0.1 0.1 1 2. Find the vector of the input image using DL model 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data
  33. 60  0 0.4 0 0 0.1 1 How to

    Build a Classifer in 5 min 1 0.2 0.5 0.1 0.1 1 2. Find the vector of the input image using DL model 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data 0.8 0.7 0.2 0.2 0.1 0
  34. 61  0.82 0.1 0.02 0 0.4 0 0 0.1

    1 How to Build a Classifer in 5 min 1 0.2 0.5 0.1 0.1 1 3. Compare all the vectors using cosine similarity 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data 0.8 0.7 0.2 0.2 0.1 0
  35. 62  0.82 0.1 0.02 0 0.4 0 0 0.1

    1 How to Build a Classifer in 5 min 1 0.2 0.5 0.1 0.1 1 4. Find the most similar class 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data 0.8 0.7 0.2 0.2 0.1 0
  36. 63  0.82 0.1 0.02 0 0.4 0 0 0.1

    1 1 0.2 0.5 0.1 0.1 1 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data 0.8 0.7 0.2 0.2 0.1 0 How to Build a Classifer in 5 min 4. Find the most similar class
  37. 64  0.82 0.1 0.02 0 0.4 0 0 0.1

    1 1 0.2 0.5 0.1 0.1 1 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data 0.8 0.7 0.2 0.2 0.1 0 How to Build a Classifer in 5 min 4. Find the most similar class hotdog!
  38. 65  0 0.4 0 0 0.1 1 Let’s Try

    Another One! 1 0.2 0.5 0.1 0.1 1 1. Find all the vectors of the images using DL model 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data
  39. 66  0 0.4 0 0 0.1 1 Let’s Try

    Another One! 1 0.2 0.5 0.1 0.1 1 2. Find the vector of the input image using DL model 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data 1 0.7 0.2 0.2 0.1 0.9
  40. 67  0.2 0.1 0.7 0 0.4 0 0 0.1

    1 Let’s Try Another One! 1 0.2 0.5 0.1 0.1 1 3. Compare all the vectors using cosine similarity 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data 1 0.7 0.2 0.2 0.1 0.9
  41. 68  0.2 0.1 0.7 0 0.4 0 0 0.1

    1 Let’s Try Another One! 1 0.2 0.5 0.1 0.1 1 3. Compare all the vectors using cosine similarity 0.3 0.1 0.3 0.2 0.1 0 data hotdog pizza pasta input data 1 0.7 0.2 0.2 0.1 0.9 NOT hotdog!
  42. 70 Transfer Learning Input Layer Output Layer Layers of Abstraction

    PRE-TRAINED DL MODEL Replace last layers 1NN Classifier  Hotdog Not Hotdog
  43. 71 Transfer Learning Input Layer (pixels) Prediction Layers of Abstraction

    NOT HOTDOG APP 1NN Classifier  Hotdog Not Hotdog Input Image Hotdog Not Hotdog or
  44. 72 Summary Recipe 1: Set up your notebook Apache Zeppelin

    Noteboook Jupyter Notebook https://zeppelin.apache.org/download.html http://jupyter.org/install.html
  45. 73 Summary Recipe 2: Make your data ready for ML

    <6,1,...> <5,2,...> Weight 5 10 15 10 5 A B Weight Diameter Cosine Similarity
  46. 74 Summary Recipe 3: Use ready-to-use DL models 0 0.5

    0 0.2 0.1 0 INCEPTION WORD2VEC 0.9 0.5 0 0.7 0.1 1
  47. 75 Summary Recipe 4: Process your Big Data offline 

     big data Loading Stage Vectorizing Stage    ML Worker ML Worker ML Worker
  48. 76 Summary Recipe 5: Get results faster dpwxr1 Loading nearby

    restaurants ... Search Input Geohash: dpwxr1
  49. 78 Slides available on Slack channel #veeva Stop by our

    booth for questions and cool swag! We are hiring!
  50. 79 Our Team Nicole Employee Success Lucy Interaction Designer Iman

    Software Engineer Caleb Director of PM Amanda Director of UX Mahsa Software Engineer Mark Director of Eng