Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GCP for data scientists

Giulia
November 08, 2018

GCP for data scientists

15 minute presentation given at DevFest Toulouse 2018.

An overview of GCP services that can be used to do a ML task shown through a use case.

Giulia

November 08, 2018
Tweet

More Decks by Giulia

Other Decks in Programming

Transcript

  1. 3

  2. What I know about the cloud • Remote physical machines

    • Managed by someone else → no need to care about the infrastructure and architecture • Pay for usage → not for rent 8
  3. Why • Access to theoretically unlimited resources • Rapidity of

    provisioning • Reliability → PROCESS DATA FASTER! 10 How reliable is Google Cloud
  4. Machine Learning in one slide • Automatically learn a set

    rules from a known dataset → TRAINING • Assess new data → PREDICTION • Data have to be processed before training and prediction → FEATURE ENGINEERING 15
  5. Baby cry project • Train a model to detect a

    baby crying and start a lullaby if need be • Feature engineering and model training locally • Recording, feature engineering and prediction on Raspberry Pi 16 giulbia/baby_cry_detection https://www.youtube.com/watch?v=N-LXrheCIKM
  6. Baby cry project • Train a model to detect a

    baby crying and start a lullaby if need be • Feature engineering and model training locally • Recording, feature engineering and prediction on Raspberry Pi giulbia/baby_cry_detection https://www.youtube.com/watch?v=N-LXrheCIKM 17 It takes about 45 seconds...
  7. 19

  8. 20 FE → feature engineering P → prediction Maj. vote

    & Final pred. → final prediction is positive iff at least 3 subsequences have a positive prediction Majority vote Final prediction FE FE FE FE FE P P P P P
  9. 21 FE FE FE FE FE P P P P

    P Majority vote Final prediction QUESTIONS - Dependencies? - How to trigger each step? - How to send the answer back for the Raspberry Pi?
  10. 22 P P P P P FE P Majority vote

    Final prediction MV + FP FE FE FE FE FE
  11. • Background function • Trigger type → Cloud Storage ◦

    Bucket → parenting-3-recording ◦ Event type → Finalize/Create • Dependencies → requirements.txt • Runtime → Python 3.7 • Easy access to Cloud ML Engine API The code that works 24 giulbia/baby_cry_rpi giulbia/gcp-rpi
  12. • Huge potential for data science • A data scientist

    can help to exploit it at its most • It’s not data scientist proof yet 27
  13. Les questions c'est dans le couloir, merci Icones from kisspng

    or made by Freepik from flaticon 28 @Giuliabianchl
  14. Annex • Online resources • MOOC Coursera - Data Engineering

    on Google Cloud Platform • GCP documentation • Google Cloud Platform Podcast 29