Slide 1

Slide 1 text

Tools for using TensorFlow eRum 2018-05-15 Andrie de Vries Solutions Engineer, RStudio @RevoAndrie 1

Slide 2

Slide 2 text

Overview • Cloud GPU • R packages • tfruns • Cloudml • Resources 2 StackOverflow: andrie Twitter: @RevoAndrie GitHub: andrie

Slide 3

Slide 3 text

Motivating example • Chest accelerometer data • UCI machine learning 3 Activity Recognition from Single Chest-Mounted Accelerometer Data Set Example inspired from work by Mango Solutions

Slide 4

Slide 4 text

Motivating example • Training data: • Samples of people walking • Acceleration in x, y and z direction 4 • Task • From the data, predict the person • 15 different people

Slide 5

Slide 5 text

Starting model 5 keras_model_sequential() %>% layer_conv_1d( filters = 40, kernel_size = 30, strides = 2, activation = "relu", input_shape = c(260, 3) ) %>% layer_max_pooling_1d(pool_size = 2) %>% layer_conv_1d( filters = 40, kernel_size = 10, activation = "relu" ) %>% layer_max_pooling_1d(pool_size = 2) %>% layer_flatten() %>% layer_dense(units = 100, activation = "sigmoid") %>% layer_dense(units = 15, activation = "softmax") 1d convolution Useful for time series data and other sequences, including text ~95% on validation set

Slide 6

Slide 6 text

TensorFlow using R 6

Slide 7

Slide 7 text

Why should R users care about TensorFlow? • A new general purpose numerical computing library • Not all data has to be in RAM • Highly general optimization, e.g. SGD, Adam • TensorFlow models can be deployed with C++ runtime • R has a lot to offer as an interface language 7

Slide 8

Slide 8 text

TensorFlow APIs • Distinct interfaces for various tasks and levels of abstraction 8

Slide 9

Slide 9 text

GPUs 9

Slide 10

Slide 10 text

GPUs • Deep neural networks perform best on complex perceptual problems: • Very high capacity models • Trained on large amounts of data • This means you need: • Large computational capacity (GPU / TPU) • Time • Some applications are excruciatingly slow on CPU • In particular, image processing (convolutional networks) and sequence processing (recurrent neural networks) 10

Slide 11

Slide 11 text

How much faster are GPU machines? • Many factors determine GPU performance • Complexity of the model • Mini-batch size • Convolution • Speed of getting data into GPU (input/output) • For recurrent tasks, GPU may not offer any advantage on CPU • For convolution, GPU and TPU can make a big difference 11

Slide 12

Slide 12 text

GPUs 12 https://tensorflow.rstudio.com/tools/gpu/ Local GPU Cloud ML Cloud Server Cloud Desktop

Slide 13

Slide 13 text

Paperspace 13 https://tensorflow.rstudio.com/tools/gpu/ Local GPU Cloud ML Cloud Server Cloud Desktop Paperspace

Slide 14

Slide 14 text

Paperspace • https://www.paperspace.com/account/signup 14

Slide 15

Slide 15 text

Google Cloud Machine Learning 15 https://tensorflow.rstudio.com/tools/gpu/ Local GPU Cloud ML Cloud Server Cloud Desktop Cloud ML

Slide 16

Slide 16 text

Google Cloud Machine Learning 16

Slide 17

Slide 17 text

Packages 17

Slide 18

Slide 18 text

TensorFlow tasks 18 Training Hyperparameter tuning Paperspace / AWS Cloud ML

Slide 19

Slide 19 text

R packages 19 Training Hyperparameter tuning Paperspace / AWS Cloud ML tfruns cloudml

Slide 20

Slide 20 text

TensorFlow tasks 20 Training Hyperparameter tuning Paperspace / AWS Cloud ML tfruns:: training_run() cloudml tfruns:: tuning_run() cloudml

Slide 21

Slide 21 text

21 Training Tuning Paperspace / AWS Cloud ML tfruns:: training_run() cloudml The tfruns package

Slide 22

Slide 22 text

tfruns • https://tensorflow.rstudio.com/tools/tfruns/ • Successful deep learning requires a huge amount of experimentation. • This requires a systematic approach to conducting and tracking the results of experiments. • The training_run() function is like the source() function, but it automatically tracks and records output and metadata for the execution of the script 22

Slide 23

Slide 23 text

tfruns::training_run() 23 library(tfruns) training_run("walking_experiments.R") view_run()

Slide 24

Slide 24 text

tfruns::ls_runs() 24 # A tibble: 131 x 38 run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc * 1 runs/2018-05-15T0… 0.189 0.951 0.0125 0.998 0.181 0.951 2 runs/2018-05-15T0… 0.195 0.938 0.0245 0.992 0.175 0.948 3 runs/2018-05-14T2… 0.192 0.941 0.0965 0.967 0.143 0.950 4 runs/2018-05-14T2… 0.189 0.938 0.231 0.921 0.159 0.945 5 runs/2018-05-14T2… 0.194 0.936 0.250 0.904 0.168 0.940 6 runs/2018-05-14T2… 0.286 0.882 0.482 0.806 0.263 0.891 7 runs/2018-05-14T2… 0.402 0.853 0.377 0.851 0.351 0.867 8 runs/2018-05-14T2… 0.265 0.893 0.291 0.884 0.230 0.911 9 runs/2018-05-14T2… 0.139 0.960 0.0308 0.989 0.108 0.970 10 runs/2018-05-14T2… 0.403 0.873 0.670 0.749 0.372 0.886 # ... with 121 more rows, and 31 more variables: flag_conv_1_filters , # flag_conv_1_kernel , flag_conv_1_pooling , flag_conv_1_dropout , # flag_conv_2_filters , flag_conv_2_kernel , flag_conv_2_pooling , # flag_conv_2_dropout , flag_dense_1_nodes , flag_dense_1_dropout , # flag_dense_2_nodes , flag_dense_2_dropout , flag_mini_batch_size , ls_runs() %>% as_tibble()

Slide 25

Slide 25 text

tfruns::ls_runs() 25 # A tibble: 131 x 38 run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc * 1 runs/2018-05-15T0… 0.189 0.951 0.0125 0.998 0.181 0.951 2 runs/2018-05-15T0… 0.195 0.938 0.0245 0.992 0.175 0.948 3 runs/2018-05-14T2… 0.192 0.941 0.0965 0.967 0.143 0.950 4 runs/2018-05-14T2… 0.189 0.938 0.231 0.921 0.159 0.945 5 runs/2018-05-14T2… 0.194 0.936 0.250 0.904 0.168 0.940 6 runs/2018-05-14T2… 0.286 0.882 0.482 0.806 0.263 0.891 7 runs/2018-05-14T2… 0.402 0.853 0.377 0.851 0.351 0.867 8 runs/2018-05-14T2… 0.265 0.893 0.291 0.884 0.230 0.911 9 runs/2018-05-14T2… 0.139 0.960 0.0308 0.989 0.108 0.970 10 runs/2018-05-14T2… 0.403 0.873 0.670 0.749 0.372 0.886 # ... with 121 more rows, and 31 more variables: flag_conv_1_filters , # flag_conv_1_kernel , flag_conv_1_pooling , flag_conv_1_dropout , # flag_conv_2_filters , flag_conv_2_kernel , flag_conv_2_pooling , # flag_conv_2_dropout , flag_dense_1_nodes , flag_dense_1_dropout , # flag_dense_2_nodes , flag_dense_2_dropout , flag_mini_batch_size , # samples , validation_samples , batch_size , epochs , ls_runs(eval_acc > 0.985, order = eval_acc) %>% as_tibble()

Slide 26

Slide 26 text

26 Training Tuning Paperspace / AWS Cloud ML cloudml tfruns:: tuning_run()

Slide 27

Slide 27 text

tfruns::flags() 27 FLAGS <- tfruns::flags( flag_integer("conv_1_filters", 16), flag_integer("conv_1_kernel", 32), flag_integer("conv_1_pooling", 2), flag_numeric("conv_1_dropout", 0.25) ) model <- keras_model_sequential() %>% layer_conv_1d( input_shape = c(260, 3), filters = FLAGS$conv_1_filters, kernel_size = FLAGS$conv_1_kernel, activation = "relu" ) %>% …

Slide 28

Slide 28 text

training_run() using FLAGS 28 tfruns::training_run( "walking_flags.R", flags = list( conv_1_filters = 32, conv_1_kernel = 8 ) ) tfruns::training_run( "walking_flags.R", flags = list( conv_1_filters = 64, conv_1_kernel = 8 ) )

Slide 29

Slide 29 text

Setting up a tuning run using flags 29 tfruns::tuning_run( "walking_flags.R", sample = 128 / 1259712, flags = list( conv_1_filters = c(16, 32, 64), conv_1_kernel = c(8, 16, 32), conv_1_pooling = c(2, 4), conv_1_dropout = c(0.1, 0.2, 0.5), conv_2_filters = c(16, 32, 64), conv_2_kernel = c(8, 16, 32), conv_2_pooling = c(2, 4), conv_2_dropout = c(0.1, 0.2, 0.5), dense_1_nodes = c(32, 64, 128, 256), dense_1_dropout = c(0.1, 0.2, 0.5), dense_2_nodes = c(32, 64, 128, 256), dense_2_dropout = c(0.1, 0.2, 0.5) )) 1.2M total combinations of flags (sampled to 128 combinations)

Slide 30

Slide 30 text

30

Slide 31

Slide 31 text

Extract the best performing models 31 run_dir eval_acc eval_loss metric_loss metric_acc metric_val_loss metric_val_acc * 1 runs/20… 0.976 0.0857 0.0970 0.965 0.0633 0.981 2 runs/20… 0.965 0.112 0.132 0.957 0.0902 0.972 3 runs/20… 0.964 0.115 0.0829 0.973 0.101 0.969 4 runs/20… 0.963 0.125 0.140 0.954 0.101 0.968 5 runs/20… 0.960 0.139 0.0308 0.989 0.108 0.970 6 runs/20… 0.960 0.127 0.0376 0.991 0.0859 0.972 7 runs/20… 0.960 0.137 0.0444 0.984 0.114 0.969 8 runs/20… 0.959 0.160 0.0573 0.985 0.130 0.958 9 runs/20… 0.959 0.141 0.0621 0.979 0.0842 0.972 10 runs/20… 0.958 0.116 0.141 0.951 0.0931 0.966 ls_runs(order = "eval_acc") %>% head()

Slide 32

Slide 32 text

Compare the top two models 32 ls_runs(order = "eval_acc") %>% head(2) %>% tfruns::compare_runs()

Slide 33

Slide 33 text

The cloudml package 33 Training Tuning Paperspace / AWS Cloud ML tfruns cloudml

Slide 34

Slide 34 text

cloudml • https://tensorflow.rstudio.com/tools/cloudml/ • Scalable training of models built with the keras, tfestimators, and tensorflow R packages. • On-demand access to training on GPUs, including Tesla P100 GPUs from NVIDIA®. • Hyperparameter tuning to optimize key attributes of model architectures in order to maximize predictive accuracy. 34

Slide 35

Slide 35 text

cloudml install and configure 35 devtools::install_github("rstudio/cloudml") cloudml::gcloud_install() cloudml::gcloud_init() Welcome! This command will take you through the configuration of gcloud. Settings from your current configuration [default] are: core: account: [email protected] disable_usage_reporting: 'True' project: tensorflow-cloud-demo Pick configuration to use: [1] Re-initialize this configuration [default] with new settings [2] Create a new configuration Please enter your numeric choice:

Slide 36

Slide 36 text

cloudml::cloudml_train() • Train on default CPU instance: • Automatically uploads contents of working directory along with script • Automatically installs all required R packages on Cloud ML servers 36 # Train on a GPU instance cloudml_train("walking_cloudml.R", master_type = "standard_gpu") # Train on an NVIDIA Tesla P100 GPU cloudml_train(" walking_cloudml.R", master_type = "standard_p100") library(cloudml) cloudml_train("mnist_mlp.R")

Slide 37

Slide 37 text

37

Slide 38

Slide 38 text

Using the Cloud ML console • https://console.cloud.google.com/mlengine/ 38

Slide 39

Slide 39 text

39

Slide 40

Slide 40 text

cloudml hyperparameter tuning 40 Submitting training job to CloudML... Job 'cloudml_2018_04_25_084916940' successfully submitted. View job in the Cloud Console at: https://console.cloud.google.com/ml/jobs/cloudml_2018_04_25_084916940?project=tensorflow-cloud-demo View logs at: https://console.cloud.google.com/logs?resource=ml.googleapis.com%2Fjob_id%2Fcloudml_2018_04_25_0849169 40&project=tensorflow-cloud-demo Check job status with: job_status("cloudml_2018_04_25_084916940") Collect job output with: job_collect("cloudml_2018_04_25_084916940") After collect, view with: view_run("runs/cloudml_2018_04_25_084916940") cloudml_train("mnist_cnn_cloudml.R", config = "tuning.yml")

Slide 41

Slide 41 text

Inspect the tuning trial results (and cleaning up) 41 # A tibble: 23 x 14 objectiveValue conv_1_dropout conv_1_filters conv_1_kernel conv_1_pooling conv_2_dropout conv_2_filters 1 0.971 0.100 128. 24. 2. 0.200 128. 2 0.970 0.100 128. 24. 2. 0. 128. 3 0.969 0. 128. 24. 2. 0.500 128. 4 0.964 0.100 128. 24. 4. 0.200 128. 5 0.964 0.200 128. 24. 4. 0.100 128. 6 0.960 0.100 128. 24. 2. 0. 128. 7 0.960 0.200 128. 24. 2. 0.100 128. 8 0.958 0. 16. 24. 2. 0.200 64. 9 0.957 0. 16. 32. 2. 0.100 128. 10 0.957 0.200 128. 24. 4. 0.100 128. job_trials() %>% as_tibble() %>% select(-one_of("hyperparameters.data_dir")) mutate_at(vars(starts_with("hyperParameters")), round, 2) %>% rename_all(~sub("finalMetric.", "", .)) %>% rename_all(~sub("hyperparameters.", "", .)) %>% select(-c(trainingStep)) %>% arrange(desc(objectiveValue))

Slide 42

Slide 42 text

Inspect the tuning trial results 42 library(ggplot2) trials_clean %>% ggplot(aes(x = trialId, y = objectiveValue)) + geom_point() + stat_smooth(method = "lm") + theme_bw(20)

Slide 43

Slide 43 text

Collect the best performing trial 43 job_collect(trials = "best") http://rpubs.com/Andrie/cloudml_tuning_run_MNIST_best_trial

Slide 44

Slide 44 text

Resources 44

Slide 45

Slide 45 text

Recommended reading on neural nets and TensorFlow Chollet and Allaire Goodfellow, Bengio & Courville 45

Slide 46

Slide 46 text

Resources • https://tensorflow.rstudio.com/tools/gpu.html • https://tensorflow.rstudio.com/tools/tfruns/articles/overview.html • https://tensorflow.rstudio.com/tools/cloudml/articles/getting_started.html 46

Slide 47

Slide 47 text

Summary • For some neural network training tasks, consider GPU machines • You have options for using GPU in the cloud: • Paperspace / AWS machines • Google Cloud ML • You have tools to do this: • tfruns • cloudml 47