Large-scale Experimentation with Spark & Productionizing Native Spark ML Models - Masood Krohy

Copyright © 2019, PatternedScience Inc. www.patterned.science Spark: Large-scale Experimentation +
Productionizing Native Spark ML Models Presenter Masood Krohy, Ph.D. May 21, 2019

2 Copyright © 2019, PatternedScience Inc. Presentation Layout • Presenter
bio • Why have several modes for doing ML • Quick intro to Spark and a tour of Web UIs Intros 01 • Distributed ML model optimization with Spark • Parallel coordinates visualization Large-scale Experimentation with Spark 02 • Training an ML model with Spark’s own ML lib • Productionizing the model with MLeap ML on Big Data with Spark 03

3 Copyright © 2019, PatternedScience Inc. Ph.D. in Computer Engineering
Analytical modeling of botnets. Validated by data collected in industry. 3 top publications. Senior Analyst, Rogers Managing the analytics reporting/statistical analyses of the national benchmarking program. Data Scientist, Intact First Data Scientist of the company. Led the Big Data mining project for the UBI program. Lead Data Scientist, CN Implemented an object-within-object detection system to detect cracks in railway equipment. Masood Krohy Presenter Bio 2013 Sr Data Science Advisor, B.Yond Implemented a pattern detection system for stream of alarms coming from telecom devices. Chief Architect, UniAnalytica (advanced data science platform) Platform contains Apache Spark, MLeap, and Anaconda, among many others. 2014 2016 2017 2018 2019 Data Science Platform Architect & Advisor

4 Copyright © 2019, PatternedScience Inc. 2. Spark & TensorFlow/scikit-learn
Distributed grid search with Spark and TensorFlow/scikit-learn (small datasets, perfectly parallel) 5. Interpretable AI Images - Classiﬁcation with visual explanation for classiﬁcations using Class Activation Maps 3. Ray Tune & TensorFlow/scikit-learn Intelligent, distributed hyperparam search with Asynchronous Hyperband, Ray Tune, and TensorFlow/scikit-learn (small datasets, perfectly parallel) 4. ML on images Images - TensorFlow Object Detection API (intro) 1. Horovod & TensorFlow Distributed Deep Learning with TensorFlow and Horovod (large datasets, data parallelism) Machine Learning Stack UniAnalytica Platform Additional pointers • Standard use of Spark for ML on Big Data is of course supported • Legacy (2016): TensorSpark (contributed to run it in production in yarn-cluster mode)

Code Walkthrough & Live Demo • Zeppelin note: ML Distributed
GridSearch with Spark • Jupyter notebook: Grid search results analysis using multidimensional visualization (Parallel Coordinates plot) • Jupyter notebook: ARIMA model with daily data (retraining the best model on each new bar and making prediction for the following bar) Notebooks/Scripts

Code Walkthrough & Live Demo • Zeppelin note: Spark ML
model training on Big Data & exporting the trained model with MLeap • Jupyter notebook: serving the trained model with MLeap and the client code (only shell commands; notebook is used for documentation) Notebooks/Scripts

Large-scale Experimentation with Spark & Produc...

Large-scale Experimentation with Spark & Productionizing Native Spark ML Models - Masood Krohy

PatternedScience

More Decks by PatternedScience

Other Decks in Technology

Featured

Transcript

Copyright © 2019, PatternedScience Inc. www.patterned.science Spark: Large-scale Experimentation +

2 Copyright © 2019, PatternedScience Inc. Presentation Layout • Presenter

3 Copyright © 2019, PatternedScience Inc. Ph.D. in Computer Engineering

4 Copyright © 2019, PatternedScience Inc. 2. Spark & TensorFlow/scikit-learn

5 Copyright © 2019, PatternedScience Inc. Graph source: Databricks

6 Copyright © 2019, PatternedScience Inc.

7 Copyright © 2019, PatternedScience Inc.

8 Copyright © 2019, PatternedScience Inc.

9 Copyright © 2019, PatternedScience Inc.

10 Copyright © 2019, PatternedScience Inc.

11 Copyright © 2019, PatternedScience Inc. Presentation Layout • Presenter

Code Walkthrough & Live Demo • Zeppelin note: ML Distributed

13 Copyright © 2019, PatternedScience Inc. Parallel Coordinates plot

14 Copyright © 2019, PatternedScience Inc. Presentation Layout • Presenter

Code Walkthrough & Live Demo • Zeppelin note: Spark ML

Q&A