Slide 1

Slide 1 text

Scalable  ML   with  Apache   Spark   Saket Bhushan @5aket

Slide 2

Slide 2 text

Contents • Basic LR with Theano • Spark 101 • ML with Spark • ML Model in Production

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Basic  Maths   y1 x1 x2 x3 w12 w21 w31 Y  =  Activation(WiXi  +  b)  

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Theano   X Y -­‐‑ Z Compilation substr(_,  _) 20 5 15 Symbolic  Variables

Slide 9

Slide 9 text

Theano   X -­‐‑ x1 Shared  Variables + 1 2 x2

Slide 10

Slide 10 text

GPU  vs  CPU hSp://deeplearning.net/software/theano/tutorial/using_gpu.html

Slide 11

Slide 11 text

Show  me  the  Code!

Slide 12

Slide 12 text

Apache  Spark

Slide 13

Slide 13 text

Spark  Ecosystem

Slide 14

Slide 14 text

Spark  Performance Source  :  hSps://databricks.com/blog/2014/10/10/spark-­‐‑petabyte-­‐‑sort.html

Slide 15

Slide 15 text

Spark  Architecture

Slide 16

Slide 16 text

Spark  Pipelines •  Dataframe – Data Frame from Spark SQL •  Transformer – Creates prediction Data Frame from Feature Data Frames •  Estimator – A learning algo is an example of estimator •  Pipeline – Collection of Transformers & Estimators.

Slide 17

Slide 17 text

Show  me  some  Code!

Slide 18

Slide 18 text

 ML  Pipeline Data Train   Classifier Predict ?

Slide 19

Slide 19 text

Actual  Pipeline •  Data o  Ingestion o  Cleaning o  Storing o  Shuffling o  Feature Extraction •  Model in Development o  Chose a model o  Train on the Data o  Tune Parameters, Validate o  Deploy •  Model in Production o  Evaluation o  Monitoring o  Logging o  Versioning o  Reports, Charts, Dashboards

Slide 20

Slide 20 text

Questions?