Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Workflow in a team of data scientists (tech tal...

Workflow in a team of data scientists (tech talk for colleagues)

I was speaking of CRISP-DM way of organizing the process of ML models development and the modifications we implemented. I also described all the stages of model development.

Marianna Diachuk

July 27, 2018
Tweet

More Decks by Marianna Diachuk

Other Decks in Programming

Transcript

  1. Business understanding • Define problem in terms of business -

    define business question to the future model Example: detect and prevent frauds intrusion • Define data science problem Example: who are considered to be frauds, how to detect frauds • Define what we need to solve the problem - what data to gather and analyze
  2. Data Understanding • EDA (Exploratory Data Analysis) Objectives: 1. Discover

    patterns 2. Spot anomalies 3. Frame hypothesis 4. Check assumptions
  3. Data Preparation (50-70% of project time) • Data Preprocessing (handle

    missing values, wrong data types, etc.) • Dataset Labeling one class problem
  4. Modeling • Tune model hyperparameters - to achieve higher accuracy

    - to improve model performance sklearn.ensemble.GradientBoostingClassifier(loss=’deviance’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’auto’)
  5. Evaluation - Model Performance Metrics (AUC, Gini, F1, Confusion matrix,

    etc.) - Business metrics (profits, approval rate, default rate, etc.) - Evaluate achievement of business Purposes Some models may not get to deployment stage after evaluation.
  6. Terms for each stage Business understanding - 1 week Data

    understanding - 3 weeks Data preparation - 5 weeks Modeling - 2 weeks Evaluation - 1 week Deployment - 1 week Full model development process - ~ 13 weeks
  7. Data scientists Data engineers Data scientists Business side Data scientists

    Data engineers Data scientists Data scientists Data engineers Development team QA team Data scientists Business side