Event Driven Machine Learning

A real time ML pipeline proposal

Giulia Bianchi Loïc Divad Data Scientist @Xebia France @Giuliabianchl Software
Engineer @Xebia France @Loicmdivad 3

G.Lo Taxi & Co. 4

G.Lo Taxi & Co. 5

Data Lover & Community Contributor Giulia Bianchi 6

G.Lo App 7 • ◦ ◦ ◦ ◦ • ◦
◦ ◦ ◦ Given current location and destination estimate trip duration

Data Science 101 8

Batch inference 9 1 Historical data about taxi trips 2
Train a model to obtain a trained model 3 Use trained model to make batch predictions

Trip duration estimation at G.Lo taxi & Co. 10 •
◦ •

Continuous inference 11 • ◦ • 3 Use trained model
to make 1 prediction Use trained model to make 1 prediction Use trained model to make 1 prediction

Streaming is the new batch 12 • • •

> println(sommaire) • • • • • • • 13

ML powered by Event Stream Apps 14

Apache Kafka Lover Loïc Divad 15

• • • • • The rise of event stream
applications 16 Centralized Event Log

DISCLAIMER 19

The Challenge Constraints • ◦ • • • ◦ ◦
20

22 • • • • • Kafka Streams application TensorFlow
MODEL Kafka TOPICS What if, your model was an event stream app?

Working Environment 23

Project structure 24 • • • • . ├── build.gradle
├── edml-schema │ ├── build.gradle │ └── src ├── edml-scoring │ ├── build.gradle │ └── src ├── edml-serving │ ├── build │ ├── build.gradle │ └── src ├── edml-trainer │ ├── build.gradle │ └── setup.py └── terraform ├── ... └── ...

25 Working Environment GKE Kafka Streams Apps Kafka as a
Service

Replay, an integration data stream 26 PICKUPS-2018-11-28 PICKUPS-REPLAY

31 AI Platform BigQuery Gitlab CI GKE Kafka Streams Apps
GCE Kafka Connect Instances GCE KSQL Servers Kafka as a Service Control Center Working Environment

The model 32

Available data 33 Pick-up Location Pick-up Datetime Drop-off Location Drop-off
Datetime Trip Duration Passenger Count Trip Distance Total Amount Tips

34 New York City zones

Wide features • ◦ day of week ◦ hour of
day ◦ pick-up zone ◦ drop-off zone 35

Deep features • ◦ day of year ◦ hour of
day ◦ pick-up zone ◦ drop-off Zone ◦ passenger count 36

Wide and Deep Learning • • ◦ → → ◦
→ → • 37

Code organisation to run in GCP • • ◦ ◦
◦ ◦ • 38 $ edml-trainer/ . ├── setup.py └── trainer ├── __init__.py ├── model.py └── task.py

Code organisation to run in GCP 39 # task.py [page
1] from . import model if __name__ == '__main__': parser = argparse.ArgumentParser() # Input Arguments for ai-platfrom parser.add_argument( '--bucket', help='GCS path to project bucket', required=True ) .... # Input arguments for modeling parser.add_argument( '--batch-size', type=int, default=512 ) # task.py [page 2] parser.add_argument( '--output-dir', help='GCS location to write checkpoints and export models', required=True ) .... # assign arguments to model variables output_dir = arguments.pop('output_dir') model.BUCKET = arguments.pop('bucket') model.BATCH_SIZE = arguments.pop('batch_size') .... # Run the training job model.train_and_evaluate(output_dir)

Code organisation to run in GCP 40 # model.py [page
1] import tensorflow as tf BATCH_SIZE = 512 ... CSV_COLUMNS = [...] LABEL_COLUMN = "trip_duration" KEY_COLUMN = "uuid" def read_dataset(...): def _input_fn(): ... return _input_fn() # Feature engineering def get_wide_deep(): ... return wide, deep # model.py [page 2] # Serving input receiver function def serving_input_receiver_fn(): receiver_tensors = { ... } return tf.estimator.export.ServingInputReceiver(features, receiver_tensors) # Model training and evaluation def train_and_evaluate(output_dir): ... estimator = tf.estimator.DNNLinearCombinedRegressor(...) ... tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

Code organisation to run in GCP #!/usr/bin/env bash BUCKET=edml TRAINER_PACKAGE_PATH=gs://$BUCKET/data/taxi-trips/sources
MAIN_TRAINER_MODULE="trainer.task" ... OUTDIR=gs://$BUCKET/ai-platform/models/$VERSION gcloud ai-platform jobs submit training $JOB_NAME \ --job-dir $JOB_DIR \ --package-path $TRAINER_PACKAGE_PATH \ --module-name $MAIN_TRAINER_MODULE \ --region $REGION \ -- \ --batch-size=$BATCH_SIZE \ --output-dir=$OUTDIR \ --pattern="*" \ --train-examples=174000 \ --eval-steps=1 41 gcloud

Development Workﬂow 42

Streaming apps deployment 44 • • • • KAFKA STREAMS
APPS PODS KUBE MASTER // build.gradle compile group: 'org.tensorflow', name: 'proto', version: '1.15.0' compile group: 'org.tensorflow', name: 'tensorflow', version: '1.15.0' // Processor.scala import org.tensorflow._ val graphDef: GraphDef = GraphDef.parseFrom(Array.empty[Byte]) val graph = new Graph() graph.importGraphDef(graphDef.toByteArray) val session = new Session(graph)

45 The SavedModel Format from TF • • Graph ◦
Not found: Resource … variable was uninitialized • ◦ ◦ Serde $ tree my_model/ . ├── saved_model.pb └── variables ├── variables.data-00000-of-00002 ├── variables.data-00001-of-00002 └── variables.index

A model producer… for automation! 46 # ModelPublisher.scala val topic:
String = "<model.topic>" val version: String = "<model.version>" val model: String = "gs://.../<model.version>" //… val producer = new KafkaProducer[_, TFSavedModel](... val key = ModelKey("<app.name>") val value = TFSavedModel(… //… producer.send(topic, key, value) producer.flush()

2 input streams 47 APP CI DEPLOY STAGE MoDEL TOPIC
NEW RECORDS PREDICTIONS • ◦ ◦ • • ◦ compact •

49 Data Source Model Source Model Storage Current Model Processing
Prediction Stream Processor RocksDB Key-Value Store

TEST Continuous integration 50 ► ► PACKAGE TRAIN DEPLOY MODEL
0.1.0-<dt>-<sha1> 0.1.0-<dt>-<sha1>-<N> 0.1.0-<dt>-<sha1> {"metadata":"..."} DEPLOY KAFKA STREAMS APP

Scoring and Post Prediction 51

TensorBoard 53

Conclusion • • • • • • • How to
face a drop in the performance over time? 56

MERCI 57

PICTURES 58 Photo by Daniel Jensen on Unsplash Photo by
Dimon Blr on Unsplash Photo by Lerone Pieters on Unsplash Photo by Miryam León on Unsplash Photo by Matthew Hamilton on Unsplash Photo by Luke Stackpoole on Unsplash Photo by Gustavo on Unsplash Photo by Negative Space from Pexels Photo by Gerrie van der Walt on Unsplash Photo by Eepeng Cheong on Unsplash Photo by Rock'n Roll Monkey on Unsplash Photo by chuttersnap on Unsplash Photo by Denys Nevozhai on Unsplash Photo by Mike Tsitas on Unsplash

ANNEX 59

Is this really a 1 month challenge? 60 Total cost:
302.82€

61 Team Work!

Training Job Manual Submission 62

A world of containers 64

Cloud is just someone else's computer 65

Data prep before TensorFlow, thank you Big Query 66

Schemas Are Service APIs For Event Streaming 67

Event Driven Machine Learning

Event Driven Machine Learning

More Decks by Loïc DIVAD

Other Decks in Programming

Featured

Transcript