Save 37% off PRO during our Black Friday Sale! »

GCP for Data Scientists

2d2dbdf5d060b4c1bb238f8f59185cfb?s=47 Giulia
February 08, 2019

GCP for Data Scientists

40 minute presentation given at DevFest Paris 2019

An overview of GCP services that can be used to do a ML task shown through a use case.

2d2dbdf5d060b4c1bb238f8f59185cfb?s=128

Giulia

February 08, 2019
Tweet

Transcript

  1. GCP for data scientists Giulia Bianchi 08 feb 2019 1

  2. @Giuliabianchl giulbia Data Scientist 2 dataxday.fr

  3. 3 Google announcement GCP podcast episode 117

  4. Democratising AI 4 To everyone and every business

  5. Democrating the cloud 5 to those who know about AI

  6. The cloud in 3 slides 6

  7. Twitter 7

  8. Shift from On-premises to *aaS • Iaas • Paas •

    Saas 8 SaaS vs PaaS vs IaaS: What’s The Difference and How To Choose
  9. Who says "cloud" says "managed services" • Remote physical machines

    • Managed by the provider → no need to care about the infrastructure and architecture • Pay for usage → not for rent 9
  10. Why 10 Why are we even talking about it?

  11. Why a data scientist should care about the cloud •

    Access to theoretically unlimited resources • Rapidity of provisioning • Reliability → PROCESS DATA FASTER! 11 How reliable is Google Cloud
  12. What Google Cloud Platform - the cloud by Google 12

  13. What are the services offered by GCP for AI/ML? 13

    Cloud AutoML Vision From conceptdraw
  14. Cloud ML engine 14 ML engine docs

  15. How A glimpse to GCP 15

  16. Concretely • Set up a GCP account (gmail ID needed)

    • Sign in Google Cloud Console and set up a project ($$$) • Install Cloud SDK 16
  17. Resources • MOOC Coursera - Data Engineering on Google Cloud

    Platform • On-line GCP documentation • GCP podcast 17
  18. Machine Learning in 3 words • TRAINING → Automatically learn

    a set rules from a known dataset • PREDICTION → Assess new data • FEATURE ENGINEERING → Data have to be processed before training and prediction 18
  19. Parenting 3.0 19 Hands on!

  20. Back to "Parenting 2.0" • Train a model to detect

    a baby crying and start a lullaby if need be • Feature engineering and model training locally (PC) • Recording, feature engineering and prediction on Raspberry Pi 20 giulbia/baby_cry_detection https://www.youtube.com/watch?v=N-LXrheCIKM
  21. Back to "Parenting 2.0" • Train a model to detect

    a baby crying and start a lullaby if need be • Feature engineering and model training locally (PC) • Recording, feature engineering and prediction on Raspberry Pi 21 giulbia/baby_cry_detection https://www.youtube.com/watch?v=N-LXrheCIKM It takes about 45 seconds...
  22. The intuition • Recording on Raspberry Pi • Use GCP

    for feature engineering and prediction 22 Recording Storage
  23. The intuition • Recording on Raspberry Pi • Use GCP

    for feature engineering and prediction 23 Recording Storage trained model ML Engine Prediction
  24. The intuition • Recording on Raspberry Pi • Use GCP

    for feature engineering and prediction 24 (*) Storage ML Engine Recording Storage trained model ML Engine
  25. Data pre-processing needs to be done BEFORE ml engine... 25

    (*)
  26. 26 FE FE FE FE FE P P P P

    P 1. FE → feature engineering 2. P → prediction 3. Maj. vote & 4. Final pred. → final prediction is positive iff at least 3 subsequences have a positive prediction Majority vote Final prediction
  27. giulbia/baby_cry_rpi giulbia/gcp-rpi 27 From PI to Storage

  28. 28 FE FE FE FE FE P P P P

    P Majority vote Final prediction QUESTIONS 1. Dependencies? 2. How to trigger each step? 3. How to send the answer back to Raspberry Pi? From Storage to??
  29. 29 FE FE FE FE FE P P P P

    P Majority vote 1. FE Final prediction ANSWERS BY GOOGLE CLOUD FUNCTIONS • Background function • Trigger type → Cloud Storage ◦ Bucket → parenting-3-recording ◦ Event type → Finalize/Create • Dependencies → requirements.txt • Easy access to Cloud ML Engine API From Storage to Functions
  30. 30 FE FE FE FE FE P P P P

    P Majority vote 1. FE 2. P Final prediction From Functions to ML engine
  31. 31 FE FE FE FE FE P P P P

    P Majority vote 1. FE 2. P 3. MV & 4. FP Final prediction Back to Functions, Storage and PI!
  32. giulbia/baby_cry_rpi giulbia/gcp-rpi 32 Results It takes about 5 seconds!

  33. giulbia/baby_cry_rpi giulbia/gcp-rpi 33 Some code: Function

  34. giulbia/baby_cry_rpi giulbia/gcp-rpi 34 Some code: Function

  35. giulbia/baby_cry_rpi giulbia/gcp-rpi 35 Some code: Function

  36. giulbia/baby_cry_rpi giulbia/gcp-rpi 36 Some code: Predict

  37. • Pricing system based on quotas ◦ number of requests

    over time ◦ training vs. prediction ◦ it depends on regions ◦ it depends on the machine used [scale tiers] • Simulation [price calculator] ◦ Europe, training job takes 15 minutes, 4 cpu, 15GB ◦ batch prediction takes 36 seconds ◦ it costs $0.08 per month • Until now I paid nothing 37 Let's talk money (ML Engine)
  38. Towards "Parenting 4.0" • Training in GCP ◦ Upload training

    data • Push it further with TensorFlow • Other options for exploration and preprocessing ◦ Cloud Datalab ◦ BigQuery ◦ Cloud Dataproc ◦ Cloud Dataflow ◦ Cloud Dataprep 38
  39. Conclusion 39

  40. • Huge potential for data science • A data scientist

    can help to exploit it at its most • Now make it data scientist friendly! 40
  41. THX 41 Icones from kisspng or made by Freepik from

    flaticon