Event Driven Machine Learning

Ed81876bf33da90cdae47ce9b8df056b?s=47 Loïc DIVAD
November 28, 2019

Event Driven Machine Learning

Ed81876bf33da90cdae47ce9b8df056b?s=128

Loïc DIVAD

November 28, 2019
Tweet

Transcript

  1. A real time ML pipeline proposal

  2. 2

  3. Giulia Bianchi Loïc Divad Data Scientist @Xebia France @Giuliabianchl Software

    Engineer @Xebia France @Loicmdivad 3
  4. G.Lo Taxi & Co. 4

  5. G.Lo Taxi & Co. 5

  6. Data Lover & Community Contributor Giulia Bianchi 6

  7. G.Lo App 7 • ◦ ◦ ◦ ◦ • ◦

    ◦ ◦ ◦ Given current location and destination estimate trip duration
  8. Data Science 101 8

  9. Batch inference 9 1 Historical data about taxi trips 2

    Train a model to obtain a trained model 3 Use trained model to make batch predictions
  10. Trip duration estimation at G.Lo taxi & Co. 10 •

    ◦ •
  11. Continuous inference 11 • ◦ • 3 Use trained model

    to make 1 prediction Use trained model to make 1 prediction Use trained model to make 1 prediction
  12. Streaming is the new batch 12 • • •

  13. > println(sommaire) • • • • • • • 13

  14. ML powered by Event Stream Apps 14

  15. Apache Kafka Lover Loïc Divad 15

  16. • • • • • The rise of event stream

    applications 16 Centralized Event Log
  17. • • • • • The rise of event stream

    applications 17 Centralized Event Log
  18. • • • • • The rise of event stream

    applications 18 Centralized Event Log
  19. DISCLAIMER 19

  20. The Challenge Constraints • ◦ • • • ◦ ◦

    20
  21. 22 • • • • • Kafka Streams application TensorFlow

    MODEL Kafka TOPICS What if, your model was an event stream app?
  22. Working Environment 23

  23. Project structure 24 • • • • . ├── build.gradle

    ├── edml-schema │ ├── build.gradle │ └── src ├── edml-scoring │ ├── build.gradle │ └── src ├── edml-serving │ ├── build │ ├── build.gradle │ └── src ├── edml-trainer │ ├── build.gradle │ └── setup.py └── terraform ├── ... └── ...
  24. 25 Working Environment GKE Kafka Streams Apps Kafka as a

    Service
  25. Replay, an integration data stream 26 PICKUPS-2018-11-28 PICKUPS-REPLAY

  26. 31 AI Platform BigQuery Gitlab CI GKE Kafka Streams Apps

    GCE Kafka Connect Instances GCE KSQL Servers Kafka as a Service Control Center Working Environment
  27. The model 32

  28. Available data 33 Pick-up Location Pick-up Datetime Drop-off Location Drop-off

    Datetime Trip Duration Passenger Count Trip Distance Total Amount Tips
  29. 34 New York City zones

  30. Wide features • ◦ day of week ◦ hour of

    day ◦ pick-up zone ◦ drop-off zone 35
  31. Deep features • ◦ day of year ◦ hour of

    day ◦ pick-up zone ◦ drop-off Zone ◦ passenger count 36
  32. Wide and Deep Learning • • ◦ → → ◦

    → → • 37
  33. Code organisation to run in GCP • • ◦ ◦

    ◦ ◦ • 38 $ edml-trainer/ . ├── setup.py └── trainer ├── __init__.py ├── model.py └── task.py
  34. Code organisation to run in GCP 39 # task.py [page

    1] from . import model if __name__ == '__main__': parser = argparse.ArgumentParser() # Input Arguments for ai-platfrom parser.add_argument( '--bucket', help='GCS path to project bucket', required=True ) .... # Input arguments for modeling parser.add_argument( '--batch-size', type=int, default=512 ) # task.py [page 2] parser.add_argument( '--output-dir', help='GCS location to write checkpoints and export models', required=True ) .... # assign arguments to model variables output_dir = arguments.pop('output_dir') model.BUCKET = arguments.pop('bucket') model.BATCH_SIZE = arguments.pop('batch_size') .... # Run the training job model.train_and_evaluate(output_dir)
  35. Code organisation to run in GCP 40 # model.py [page

    1] import tensorflow as tf BATCH_SIZE = 512 ... CSV_COLUMNS = [...] LABEL_COLUMN = "trip_duration" KEY_COLUMN = "uuid" def read_dataset(...): def _input_fn(): ... return _input_fn() # Feature engineering def get_wide_deep(): ... return wide, deep # model.py [page 2] # Serving input receiver function def serving_input_receiver_fn(): receiver_tensors = { ... } return tf.estimator.export.ServingInputReceiver(features, receiver_tensors) # Model training and evaluation def train_and_evaluate(output_dir): ... estimator = tf.estimator.DNNLinearCombinedRegressor(...) ... tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
  36. Code organisation to run in GCP #!/usr/bin/env bash BUCKET=edml TRAINER_PACKAGE_PATH=gs://$BUCKET/data/taxi-trips/sources

    MAIN_TRAINER_MODULE="trainer.task" ... OUTDIR=gs://$BUCKET/ai-platform/models/$VERSION gcloud ai-platform jobs submit training $JOB_NAME \ --job-dir $JOB_DIR \ --package-path $TRAINER_PACKAGE_PATH \ --module-name $MAIN_TRAINER_MODULE \ --region $REGION \ -- \ --batch-size=$BATCH_SIZE \ --output-dir=$OUTDIR \ --pattern="*" \ --train-examples=174000 \ --eval-steps=1 41 gcloud
  37. Development Workflow 42

  38. Streaming apps deployment 44 • • • • KAFKA STREAMS

    APPS PODS KUBE MASTER // build.gradle compile group: 'org.tensorflow', name: 'proto', version: '1.15.0' compile group: 'org.tensorflow', name: 'tensorflow', version: '1.15.0' // Processor.scala import org.tensorflow._ val graphDef: GraphDef = GraphDef.parseFrom(Array.empty[Byte]) val graph = new Graph() graph.importGraphDef(graphDef.toByteArray) val session = new Session(graph)
  39. 45 The SavedModel Format from TF • • Graph ◦

    Not found: Resource … variable was uninitialized • ◦ ◦ Serde $ tree my_model/ . ├── saved_model.pb └── variables ├── variables.data-00000-of-00002 ├── variables.data-00001-of-00002 └── variables.index
  40. A model producer… for automation! 46 # ModelPublisher.scala val topic:

    String = "<model.topic>" val version: String = "<model.version>" val model: String = "gs://.../<model.version>" //… val producer = new KafkaProducer[_, TFSavedModel](... val key = ModelKey("<app.name>") val value = TFSavedModel(… //… producer.send(topic, key, value) producer.flush()
  41. 2 input streams 47 APP CI DEPLOY STAGE MoDEL TOPIC

    NEW RECORDS PREDICTIONS • ◦ ◦ • • ◦ compact •
  42. 48

  43. 49 Data Source Model Source Model Storage Current Model Processing

    Prediction Stream Processor RocksDB Key-Value Store
  44. TEST Continuous integration 50 ► ► PACKAGE TRAIN DEPLOY MODEL

    0.1.0-<dt>-<sha1> 0.1.0-<dt>-<sha1>-<N> 0.1.0-<dt>-<sha1> {"metadata":"..."} DEPLOY KAFKA STREAMS APP
  45. Scoring and Post Prediction 51

  46. TensorBoard 53

  47. 54

  48. 55

  49. Conclusion • • • • • • • How to

    face a drop in the performance over time? 56
  50. MERCI 57

  51. PICTURES 58 Photo by Daniel Jensen on Unsplash Photo by

    Dimon Blr on Unsplash Photo by Lerone Pieters on Unsplash Photo by Miryam León on Unsplash Photo by Matthew Hamilton on Unsplash Photo by Luke Stackpoole on Unsplash Photo by Gustavo on Unsplash Photo by Negative Space from Pexels Photo by Gerrie van der Walt on Unsplash Photo by Eepeng Cheong on Unsplash Photo by Rock'n Roll Monkey on Unsplash Photo by chuttersnap on Unsplash Photo by Denys Nevozhai on Unsplash Photo by Mike Tsitas on Unsplash
  52. ANNEX 59

  53. Is this really a 1 month challenge? 60 Total cost:

    302.82€
  54. 61 Team Work!

  55. Training Job Manual Submission 62

  56. 63

  57. A world of containers 64

  58. Cloud is just someone else's computer 65

  59. Data prep before TensorFlow, thank you Big Query 66

  60. Schemas Are Service APIs For Event Streaming 67