Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reducing Pager Fatigue with a Serverless ML Bot

Reducing Pager Fatigue with a Serverless ML Bot

Being woken up at 3 am by the pager is never fun but seeing an incident resolve before you’ve even left the bed is maddening. Sleepily the next day you tune the alert for a better night’s sleep yet more untuned alerts sing to you in your sleep. After a few rounds of alert-tuning whack-a-mole you wonder: Could I predict if an incident will resolve itself?

This is the story of how a weary engineer used a Cloud ML model with Cloud Functions to reduce pager noise. Recounting some of the challenges faced, we’ll explore training a model with a limited data set & continual training in a serverless environment. We’ll also explore the implications of using a bot as a first responder to a pager.

Mike Fowler

October 01, 2019
Tweet

More Decks by Mike Fowler

Other Decks in Technology

Transcript

  1. October 1st 2019 @mlfowler_ @Claranet Reducing Pager Fatigue using a

    Serverless ML Bot Mike Fowler - Senior Site Reliability Engineer - Public Cloud Practice PLACE CUSTOMER LOGO HERE
  2. October 1st 2019 @mlfowler_ @Claranet I Like to Think I

    Know Data Source: https://peakcare.wordpress.com/2011/10/05/heads-in-the-sand/ https://i.pinimg.com/originals/cb/32/5f/cb325f9c268bf2135125f512d95
  3. October 1st 2019 @mlfowler_ @Claranet Scene: Our Engineer Rests Peacefully

    Source: https://peakcare.wordpress.com/2011/10/05/heads-in-the-sand/ https://i.pinimg.com/originals/cb/32/5f/cb325f9c268bf2135125f512d95
  4. October 1st 2019 @mlfowler_ @Claranet Scene: Red Alert! Source: https://peakcare.wordpress.com/2011/10/05/heads-in-the-sand/

    https://vignette.wikia.nocookie.net/memoryalpha/images/6/6b/RedAlert.jpg/revision/latest?cb=20100117050244&path-prefix=en
  5. October 1st 2019 @mlfowler_ @Claranet The Problem Many PagerDuty incidents

    resolve before I respond disrupting my sleep needlessly
  6. October 1st 2019 @mlfowler_ @Claranet The Shape of Data 23

    2 timestamps 6 numeric 15 text features 20678 samples
  7. October 1st 2019 @mlfowler_ @Claranet The Shape of Usable Data

    5 1 timestamp 1 numeric 3 text features 19354 samples
  8. October 1st 2019 @mlfowler_ @Claranet Feature Engineering • Worked examples

    starting from simple number manipulations to complex processes such as principal component analysis (PCA) • Lots of Python code and decent explanations • Primarily scikit-learn • Decent bibliography per chapter
  9. October 1st 2019 @mlfowler_ @Claranet The Shape of Usable Data

    47 2185 positive class 2185 negative class features 4370 samples
  10. October 1st 2019 @mlfowler_ @Claranet Choosing a Model: Random Forest

    https://miro.medium.com/max/2612/0*f_qQPFpdofWGLQqc.png
  11. October 1st 2019 @mlfowler_ @Claranet Validating the Model: Cross Validation

    https://towardsdatascience.com/cross-validation-explained-evaluating-estimator-performance-e51e5430ff85
  12. October 1st 2019 @mlfowler_ @Claranet The Model is only a

    Keystone https://io9.gizmodo.com/https://miro.medium.com/max/3036/1*SQg9Buf5w-rR2T8vCIVy3g.jpeg
  13. October 1st 2019 @mlfowler_ @Claranet AI Platform • Hosted Jupyter

    notebooks • Distributable training with automatic resource provisioning • Supports CPUs, GPUs and TPUs • Run across many nodes and multiple experiments • Automated hyperparameter tuning with HyperTune • Exportable models • Model hosting for online prediction
  14. October 1st 2019 @mlfowler_ @Claranet Exporting a scikit-learn Model from

    sklearn.ensemble import RandomForestClassifier from sklearn.externals import joblib model = RandomForestClassifier(n_estimators=n) ... model.predict = model.predict_proba joblib.dump(model, 'model.joblib')
  15. October 1st 2019 @mlfowler_ @Claranet Deploying a Model: Upload $

    gsutil cp ./model.joblib gs://your-bucket/model.joblib NB: The directory containing the model must be 250MB or less
  16. October 1st 2019 @mlfowler_ @Claranet Deploying a Model: Create a

    Model $ gcloud ai-platform models create mrdata
  17. October 1st 2019 @mlfowler_ @Claranet Deploying a Model: Create a

    Version $ gcloud ai-platform versions create v1 --model mrdata --origin gs://your-bucket/model.joblib --runtime-version=1.14 --framework SCIKIT_LEARN --python-version=3.5
  18. October 1st 2019 @mlfowler_ @Claranet Cloud Functions • Function-as-a-Service supporting:

    - Node.js 6, 8 & 10 - Python 3.7.1 - Go 1.11.6 • Triggerable from: - HTTP - Cloud Storage - Pub/Sub - Cloud Scheduler
  19. October 1st 2019 @mlfowler_ @Claranet A Go Cloud Function import

    ( "net/http" ) func PagerDuty(w http.ResponseWriter, r *http.Request) { //awesome code }
  20. October 1st 2019 @mlfowler_ @Claranet Function Deployment Preliminaries $ gcloud

    iam service-accounts create mrdata --display-name "Mr Data's Service Account" $ gcloud beta projects add-iam-policy-binding myproject --member serviceAccount:[email protected] --role roles/ml.developer
  21. October 1st 2019 @mlfowler_ @Claranet Function Deployment $ gcloud beta

    functions deploy mrdata --entry-point PagerDuty --runtime go111 --trigger-http --service-account [email protected]
  22. October 1st 2019 @mlfowler_ @Claranet Cloud Pub/Sub • Publish/Subscribe messaging

    service • At-least-once delivery • Seek & Replay - A subscription only sees from after it was created
  23. October 1st 2019 @mlfowler_ @Claranet Topic & Subscription Creation $

    gcloud pubsub topics create pd-notify $ gcloud pubsub subscriptions create --topic pd-notify pd-notify-model $ gcloud pubsub subscriptions create --topic pd-notify pd-notify-firestore
  24. October 1st 2019 @mlfowler_ @Claranet Cloud Firestore • Serverless NoSQL

    document database • ACID transactions • Automatic scaling & indexing • Multi-region replication • Client libraries provide live and offline synchronization
  25. October 1st 2019 @mlfowler_ @Claranet A Simple struct for Recording

    Inferences type Prediction struct { Incident string `firestore:”incident”` Prediction bool `firestore:”prediction”` Confidence float64 `filestore:”confidence”` }
  26. October 1st 2019 @mlfowler_ @Claranet Adding a Document to a

    Collection pred := Prediction{ Incident: “PRX7NJU”, Prediction: true, Confidence: 0.7677249, } _, err := client.Collection(“predictions”) .Doc(“PRX7NJU”) .Set(ctx, pred)