Slide 1

Slide 1 text

GCP for data scientists Giulia Bianchi 08 feb 2019 1

Slide 2

Slide 2 text

@Giuliabianchl giulbia Data Scientist 2 dataxday.fr

Slide 3

Slide 3 text

3 Google announcement GCP podcast episode 117

Slide 4

Slide 4 text

Democratising AI 4 To everyone and every business

Slide 5

Slide 5 text

Democrating the cloud 5 to those who know about AI

Slide 6

Slide 6 text

The cloud in 3 slides 6

Slide 7

Slide 7 text

Twitter 7

Slide 8

Slide 8 text

Shift from On-premises to *aaS ● Iaas ● Paas ● Saas 8 SaaS vs PaaS vs IaaS: What’s The Difference and How To Choose

Slide 9

Slide 9 text

Who says "cloud" says "managed services" ● Remote physical machines ● Managed by the provider → no need to care about the infrastructure and architecture ● Pay for usage → not for rent 9

Slide 10

Slide 10 text

Why 10 Why are we even talking about it?

Slide 11

Slide 11 text

Why a data scientist should care about the cloud ● Access to theoretically unlimited resources ● Rapidity of provisioning ● Reliability → PROCESS DATA FASTER! 11 How reliable is Google Cloud

Slide 12

Slide 12 text

What Google Cloud Platform - the cloud by Google 12

Slide 13

Slide 13 text

What are the services offered by GCP for AI/ML? 13 Cloud AutoML Vision From conceptdraw

Slide 14

Slide 14 text

Cloud ML engine 14 ML engine docs

Slide 15

Slide 15 text

How A glimpse to GCP 15

Slide 16

Slide 16 text

Concretely ● Set up a GCP account (gmail ID needed) ● Sign in Google Cloud Console and set up a project ($$$) ● Install Cloud SDK 16

Slide 17

Slide 17 text

Resources ● MOOC Coursera - Data Engineering on Google Cloud Platform ● On-line GCP documentation ● GCP podcast 17

Slide 18

Slide 18 text

Machine Learning in 3 words ● TRAINING → Automatically learn a set rules from a known dataset ● PREDICTION → Assess new data ● FEATURE ENGINEERING → Data have to be processed before training and prediction 18

Slide 19

Slide 19 text

Parenting 3.0 19 Hands on!

Slide 20

Slide 20 text

Back to "Parenting 2.0" ● Train a model to detect a baby crying and start a lullaby if need be ● Feature engineering and model training locally (PC) ● Recording, feature engineering and prediction on Raspberry Pi 20 giulbia/baby_cry_detection https://www.youtube.com/watch?v=N-LXrheCIKM

Slide 21

Slide 21 text

Back to "Parenting 2.0" ● Train a model to detect a baby crying and start a lullaby if need be ● Feature engineering and model training locally (PC) ● Recording, feature engineering and prediction on Raspberry Pi 21 giulbia/baby_cry_detection https://www.youtube.com/watch?v=N-LXrheCIKM It takes about 45 seconds...

Slide 22

Slide 22 text

The intuition ● Recording on Raspberry Pi ● Use GCP for feature engineering and prediction 22 Recording Storage

Slide 23

Slide 23 text

The intuition ● Recording on Raspberry Pi ● Use GCP for feature engineering and prediction 23 Recording Storage trained model ML Engine Prediction

Slide 24

Slide 24 text

The intuition ● Recording on Raspberry Pi ● Use GCP for feature engineering and prediction 24 (*) Storage ML Engine Recording Storage trained model ML Engine

Slide 25

Slide 25 text

Data pre-processing needs to be done BEFORE ml engine... 25 (*)

Slide 26

Slide 26 text

26 FE FE FE FE FE P P P P P 1. FE → feature engineering 2. P → prediction 3. Maj. vote & 4. Final pred. → final prediction is positive iff at least 3 subsequences have a positive prediction Majority vote Final prediction

Slide 27

Slide 27 text

giulbia/baby_cry_rpi giulbia/gcp-rpi 27 From PI to Storage

Slide 28

Slide 28 text

28 FE FE FE FE FE P P P P P Majority vote Final prediction QUESTIONS 1. Dependencies? 2. How to trigger each step? 3. How to send the answer back to Raspberry Pi? From Storage to??

Slide 29

Slide 29 text

29 FE FE FE FE FE P P P P P Majority vote 1. FE Final prediction ANSWERS BY GOOGLE CLOUD FUNCTIONS ● Background function ● Trigger type → Cloud Storage ○ Bucket → parenting-3-recording ○ Event type → Finalize/Create ● Dependencies → requirements.txt ● Easy access to Cloud ML Engine API From Storage to Functions

Slide 30

Slide 30 text

30 FE FE FE FE FE P P P P P Majority vote 1. FE 2. P Final prediction From Functions to ML engine

Slide 31

Slide 31 text

31 FE FE FE FE FE P P P P P Majority vote 1. FE 2. P 3. MV & 4. FP Final prediction Back to Functions, Storage and PI!

Slide 32

Slide 32 text

giulbia/baby_cry_rpi giulbia/gcp-rpi 32 Results It takes about 5 seconds!

Slide 33

Slide 33 text

giulbia/baby_cry_rpi giulbia/gcp-rpi 33 Some code: Function

Slide 34

Slide 34 text

giulbia/baby_cry_rpi giulbia/gcp-rpi 34 Some code: Function

Slide 35

Slide 35 text

giulbia/baby_cry_rpi giulbia/gcp-rpi 35 Some code: Function

Slide 36

Slide 36 text

giulbia/baby_cry_rpi giulbia/gcp-rpi 36 Some code: Predict

Slide 37

Slide 37 text

● Pricing system based on quotas ○ number of requests over time ○ training vs. prediction ○ it depends on regions ○ it depends on the machine used [scale tiers] ● Simulation [price calculator] ○ Europe, training job takes 15 minutes, 4 cpu, 15GB ○ batch prediction takes 36 seconds ○ it costs $0.08 per month ● Until now I paid nothing 37 Let's talk money (ML Engine)

Slide 38

Slide 38 text

Towards "Parenting 4.0" ● Training in GCP ○ Upload training data ● Push it further with TensorFlow ● Other options for exploration and preprocessing ○ Cloud Datalab ○ BigQuery ○ Cloud Dataproc ○ Cloud Dataflow ○ Cloud Dataprep 38

Slide 39

Slide 39 text

Conclusion 39

Slide 40

Slide 40 text

● Huge potential for data science ● A data scientist can help to exploit it at its most ● Now make it data scientist friendly! 40

Slide 41

Slide 41 text

THX 41 Icones from kisspng or made by Freepik from flaticon