Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scalable Machine Learning with Apache Spark

Saket Bhushan
September 24, 2015

Scalable Machine Learning with Apache Spark

The talk demonstrates an end to end design (architecture, implementation and deployment) of Downpour-like stochastic gradient descent using Apache Spark. Spark is the next generation cluster computing framework from the UC Berkeley and Databricks teams.

Saket Bhushan

September 24, 2015
Tweet

More Decks by Saket Bhushan

Other Decks in Technology

Transcript

  1. Basic  Maths   y1 x1 x2 x3 w12 w21 w31

    Y  =  Activation(WiXi  +  b)  
  2. Spark  Pipelines •  Dataframe – Data Frame from Spark SQL

    •  Transformer – Creates prediction Data Frame from Feature Data Frames •  Estimator – A learning algo is an example of estimator •  Pipeline – Collection of Transformers & Estimators.
  3. Actual  Pipeline •  Data o  Ingestion o  Cleaning o  Storing

    o  Shuffling o  Feature Extraction •  Model in Development o  Chose a model o  Train on the Data o  Tune Parameters, Validate o  Deploy •  Model in Production o  Evaluation o  Monitoring o  Logging o  Versioning o  Reports, Charts, Dashboards