Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GCP for Data Scientists

Giulia
February 08, 2019

GCP for Data Scientists

40 minute presentation given at DevFest Paris 2019

An overview of GCP services that can be used to do a ML task shown through a use case.

https://youtu.be/UhkSBchk8DM

Giulia

February 08, 2019
Tweet

More Decks by Giulia

Other Decks in Programming

Transcript

  1. Shift from On-premises to *aaS • Iaas • Paas •

    Saas 8 SaaS vs PaaS vs IaaS: What’s The Difference and How To Choose
  2. Who says "cloud" says "managed services" • Remote physical machines

    • Managed by the provider → no need to care about the infrastructure and architecture • Pay for usage → not for rent 9
  3. Why a data scientist should care about the cloud •

    Access to theoretically unlimited resources • Rapidity of provisioning • Reliability → PROCESS DATA FASTER! 11 How reliable is Google Cloud
  4. What are the services offered by GCP for AI/ML? 13

    Cloud AutoML Vision From conceptdraw
  5. Concretely • Set up a GCP account (gmail ID needed)

    • Sign in Google Cloud Console and set up a project ($$$) • Install Cloud SDK 16
  6. Resources • MOOC Coursera - Data Engineering on Google Cloud

    Platform • On-line GCP documentation • GCP podcast 17
  7. Machine Learning in 3 words • TRAINING → Automatically learn

    a set rules from a known dataset • PREDICTION → Assess new data • FEATURE ENGINEERING → Data have to be processed before training and prediction 18
  8. Back to "Parenting 2.0" • Train a model to detect

    a baby crying and start a lullaby if need be • Feature engineering and model training locally (PC) • Recording, feature engineering and prediction on Raspberry Pi 20 giulbia/baby_cry_detection https://www.youtube.com/watch?v=N-LXrheCIKM
  9. Back to "Parenting 2.0" • Train a model to detect

    a baby crying and start a lullaby if need be • Feature engineering and model training locally (PC) • Recording, feature engineering and prediction on Raspberry Pi 21 giulbia/baby_cry_detection https://www.youtube.com/watch?v=N-LXrheCIKM It takes about 45 seconds...
  10. The intuition • Recording on Raspberry Pi • Use GCP

    for feature engineering and prediction 22 Recording Storage
  11. The intuition • Recording on Raspberry Pi • Use GCP

    for feature engineering and prediction 23 Recording Storage trained model ML Engine Prediction
  12. The intuition • Recording on Raspberry Pi • Use GCP

    for feature engineering and prediction 24 (*) Storage ML Engine Recording Storage trained model ML Engine
  13. 26 FE FE FE FE FE P P P P

    P 1. FE → feature engineering 2. P → prediction 3. Maj. vote & 4. Final pred. → final prediction is positive iff at least 3 subsequences have a positive prediction Majority vote Final prediction
  14. 28 FE FE FE FE FE P P P P

    P Majority vote Final prediction QUESTIONS 1. Dependencies? 2. How to trigger each step? 3. How to send the answer back to Raspberry Pi? From Storage to??
  15. 29 FE FE FE FE FE P P P P

    P Majority vote 1. FE Final prediction ANSWERS BY GOOGLE CLOUD FUNCTIONS • Background function • Trigger type → Cloud Storage ◦ Bucket → parenting-3-recording ◦ Event type → Finalize/Create • Dependencies → requirements.txt • Easy access to Cloud ML Engine API From Storage to Functions
  16. 30 FE FE FE FE FE P P P P

    P Majority vote 1. FE 2. P Final prediction From Functions to ML engine
  17. 31 FE FE FE FE FE P P P P

    P Majority vote 1. FE 2. P 3. MV & 4. FP Final prediction Back to Functions, Storage and PI!
  18. • Pricing system based on quotas ◦ number of requests

    over time ◦ training vs. prediction ◦ it depends on regions ◦ it depends on the machine used [scale tiers] • Simulation [price calculator] ◦ Europe, training job takes 15 minutes, 4 cpu, 15GB ◦ batch prediction takes 36 seconds ◦ it costs $0.08 per month • Until now I paid nothing 37 Let's talk money (ML Engine)
  19. Towards "Parenting 4.0" • Training in GCP ◦ Upload training

    data • Push it further with TensorFlow • Other options for exploration and preprocessing ◦ Cloud Datalab ◦ BigQuery ◦ Cloud Dataproc ◦ Cloud Dataflow ◦ Cloud Dataprep 38
  20. • Huge potential for data science • A data scientist

    can help to exploit it at its most • Now make it data scientist friendly! 40