M.L. tangle teaser - The TensorFlow Estimator chapter

PARIS DATA LADIES - 06/10/2020 The TensorFlow Estimator Chapter M.
L. TANGLE TEASER

Data Scientist PS Engineering Speaker @Giuliabianchl GIULIA BIANCHI

@Giuliabianchl ON SCREEN TODAY machine learning industrialisation challenges use case
TensorFlow Estimator

@Giuliabianchl CHALLENGES almost real time batch inference working pipeline model
performance production environment exploration environment data engineering tasks data science tasks monitoring re-training … iterative approach

@Giuliabianchl COLLABORATION almost real time batch inference working pipeline model
performance production environment exploration environment data engineering tasks data science tasks monitoring re-training iterative approach

@Giuliabianchl Kaggle challenge - given current location and destination estimate
TRIP DURATION - classical machine learning task - batch inference Our challenge - new data comes in each time someone orders a taxi - not in batches - CONTINUOUS INFERENCE AUGMENTED NYC TAXI TRIP CHALLENGE USE CASE Photo by Myke Simon on Unsplash

@Giuliabianchl TRAINING DATA - drop-oﬀ location - drop-oﬀ time -
pick-up location - pick-up time - trip distance - total amount - passanger count - … - trip duration (target variable) Photo by Negative Space from Pexels

@Giuliabianchl USE CASE Hi, I need a taxi to Time
Square Sure, your taxi is coming soon, you’ll be at your destination in 20 minutes

@Giuliabianchl GLOBAL IDEA Hi, I need a taxi to Time
Square Hi, I need a taxi to Time Square Sure, your taxi is coming soon, you’ll be at your destination in 20 minutes USE CASE

@Giuliabianchl data stream simulation GLOBAL ARCHITECTURE Hi, I need a
taxi to Time Square Hi, I need a taxi to Time Square Sure, your taxi is coming soon, you’ll be at your destination in 20 minutes storage and preprocessing: Google BigQuery ML Infrastructure as code CI Orchestration Object storage USE CASE

@Giuliabianchl - "Event driven machine learning" - XebiCon19 - Nov
2019 - Conﬂuent webinar - Apr 2020 REFERENCE @LoicMDivad

@Giuliabianchl TENSORFLOW ESTIMATOR & GCP

@Giuliabianchl - framework for specifying, training, evaluating and deploying ML
models - model-level abstraction - pre-made estimators for classiﬁcation and regression - linear model - boosted trees - deep neural networks - combined linear & DNN - custom models can be converted to estimators - tf.keras.estimator.model_to_estimator - same code for local host vs. distributed multi-server environment TENSORFLOW ESTIMATOR

@Giuliabianchl TENSORFLOW ESTIMATOR input function feature columns model function train
evaluate predict - executed in tf.Graph - returns tf.data.Dataset - tf.feature_ column - feature processing - pre-made or custom model - allows easy iterative dev. serving_input_receiver function to build a part of a tf.Graph that parses the raw data received by the SavedModel

@Giuliabianchl - 217M data points - ai-platform - notebooks for
exploring, building and testing locally - remote training and prediction - hyperparameter tuning - model deployment - code must be organised and packaged properly CODE ORGANISATION TO RUN IN GCP $ tree edml-trainer/ . ├── setup.py └── trainer ├── __init__.py ├── model.py ├── task.py └── util.py

@Giuliabianchl from . import model def parse_arguments(): parser = argparse.ArgumentParser()
# Input Arguments for ai-platfrom parser.add_argument( '--bucket', help='GCS path to project bucket', required=True )... # Input arguments for modeling parser.add_argument( '--batch-size', type=int, default=128 )... return args() def train_and_evaluate(args): estimator, train_spec, eval_spec = model.my_estimator(...) tf.estimator.train_and_evaluate(...) if __name__ == '__main__': args = parse_arguments() train_and_evaluate(args) TASK.PY

@Giuliabianchl import tensorflow as tf from . import util def
my_estimator(...): ... # Feature engineering wide, deep = util.get_wide_deep(...) # Estimator definition estimator = tf.estimator.DNNLinearCombinedRegressor( model_dir=output_dir, linear_feature_columns=wide, dnn_feature_columns=deep, dnn_hidden_units=nnsize, batch_norm=True, dnn_dropout=0.1, config=run_config) train_spec = tf.estimator.TrainSpec( input_fn=util.read_dataset(...), ...) exporter = tf.estimator.BestExporter( ‘exporter’, serving_input_receiver_fn=util.serving_input_receiver_fn) eval_spec = tf.estimator.EvalSpec( input_fn=util.read_dataset(...), ..., exporter=exporter) return estimator, train_spec, eval_spec MODEL.PY

@Giuliabianchl import tensorflow as tf from tensorflow_io.bigquery import BigQueryClient #
Read input data def read_dataset(...): def _input_fn(): client = BigQueryClient() read_session = client.read_session(...) dataset = read_session.parallel_read_rows(sloppy=True).map(lambda records: ...) ... return tf.data.Dataset(...) return _input_fn() # Feature engineering def get_wide_deep(...): # Sparse columns wide = [ tf.feature_column.categorical_with_identity(...), ... ] # Dense columns deep = [ tf.feature_column.embedding_column(...), ... ] return wide, deep # Serving input receiver function def serving_input_receiver_fn(): receiver_tensors = { … } return tf.estimator.export.ServingInputReceiver(features, receiver_tensors) UTIL.PY

@Giuliabianchl TRAINING IN GCP #!/usr/bin/env bash BUCKET=edml TRAINER_PACKAGE_PATH=gs://$BUCKET/data/taxi-trips/sources MAIN_TRAINER_MODULE="trainer.task" ...
OUTDIR=gs://$BUCKET/ai-platform/models/$VERSION gcloud ai-platform jobs submit training $JOB_NAME \ --job-dir $JOB_DIR \ --package-path $TRAINER_PACKAGE_PATH \ --module-name $MAIN_TRAINER_MODULE \ --region $REGION \ -- \ --batch-size=$BATCH_SIZE \ --output-dir=$OUTDIR \ --train-steps=2800000 \ --eval-steps=3 variable definition gcloud specific flags user arguments for specify application

@Giuliabianchl - ai-platform - TensorFlow serving (TFX) - Kubeﬂow serving
- … synchronous API call PREDICTION $ tree my_model/ . ├── saved_model.pb └── variables ├── variables.data-00000-of-00002 ├── variables.data-00001-of-00002 └── variables.index

@Giuliabianchl - use TensorFlow Java to load the model -
or other open source wrapper (zoltar) - load predictive machine learning models in a JVM - tf.Graph as interface between DE & DS ASYNCHRONOUS PREDICTION JVM TF.GRAPH raw data prediction raw data training transform

@Giuliabianchl COLLABORATION data engineering tasks data science tasks tf.estimator asynchronous
prediction

@Giuliabianchl MERCI QUESTIONS?

M.L. tangle teaser - The TensorFlow Estimator c...

M.L. tangle teaser - The TensorFlow Estimator chapter

Giulia

More Decks by Giulia

Other Decks in Programming

Featured

Transcript

PARIS DATA LADIES - 06/10/2020 The TensorFlow Estimator Chapter M.

Data Scientist PS Engineering Speaker @Giuliabianchl GIULIA BIANCHI

@Giuliabianchl ON SCREEN TODAY machine learning industrialisation challenges use case

@Giuliabianchl CHALLENGES almost real time batch inference working pipeline model

@Giuliabianchl COLLABORATION almost real time batch inference working pipeline model

@Giuliabianchl Kaggle challenge - given current location and destination estimate

@Giuliabianchl TRAINING DATA - drop-oﬀ location - drop-oﬀ time -

@Giuliabianchl USE CASE Hi, I need a taxi to Time

@Giuliabianchl GLOBAL IDEA Hi, I need a taxi to Time

@Giuliabianchl data stream simulation GLOBAL ARCHITECTURE Hi, I need a

@Giuliabianchl - "Event driven machine learning" - XebiCon19 - Nov

@Giuliabianchl TENSORFLOW ESTIMATOR & GCP

@Giuliabianchl - framework for specifying, training, evaluating and deploying ML

@Giuliabianchl TENSORFLOW ESTIMATOR input function feature columns model function train

@Giuliabianchl - 217M data points - ai-platform - notebooks for

@Giuliabianchl from . import model def parse_arguments(): parser = argparse.ArgumentParser()

@Giuliabianchl import tensorflow as tf from . import util def

@Giuliabianchl import tensorflow as tf from tensorflow_io.bigquery import BigQueryClient #

@Giuliabianchl TRAINING IN GCP #!/usr/bin/env bash BUCKET=edml TRAINER_PACKAGE_PATH=gs://$BUCKET/data/taxi-trips/sources MAIN_TRAINER_MODULE="trainer.task" ...

@Giuliabianchl - ai-platform - TensorFlow serving (TFX) - Kubeﬂow serving

@Giuliabianchl - use TensorFlow Java to load the model -

@Giuliabianchl COLLABORATION data engineering tasks data science tasks tf.estimator asynchronous

@Giuliabianchl MERCI QUESTIONS?