Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Workflow in a team of data scientists (tech tal...

Workflow in a team of data scientists (tech talk for colleagues)

I was speaking of CRISP-DM way of organizing the process of ML models development and the modifications we implemented. I also described all the stages of model development.

Avatar for Marianna Diachuk

Marianna Diachuk

July 27, 2018
Tweet

More Decks by Marianna Diachuk

Other Decks in Programming

Transcript

  1. Business understanding • Define problem in terms of business -

    define business question to the future model Example: detect and prevent frauds intrusion • Define data science problem Example: who are considered to be frauds, how to detect frauds • Define what we need to solve the problem - what data to gather and analyze
  2. Data Understanding • EDA (Exploratory Data Analysis) Objectives: 1. Discover

    patterns 2. Spot anomalies 3. Frame hypothesis 4. Check assumptions
  3. Data Preparation (50-70% of project time) • Data Preprocessing (handle

    missing values, wrong data types, etc.) • Dataset Labeling one class problem
  4. Modeling • Tune model hyperparameters - to achieve higher accuracy

    - to improve model performance sklearn.ensemble.GradientBoostingClassifier(loss=’deviance’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’auto’)
  5. Evaluation - Model Performance Metrics (AUC, Gini, F1, Confusion matrix,

    etc.) - Business metrics (profits, approval rate, default rate, etc.) - Evaluate achievement of business Purposes Some models may not get to deployment stage after evaluation.
  6. Terms for each stage Business understanding - 1 week Data

    understanding - 3 weeks Data preparation - 5 weeks Modeling - 2 weeks Evaluation - 1 week Deployment - 1 week Full model development process - ~ 13 weeks
  7. Data scientists Data engineers Data scientists Business side Data scientists

    Data engineers Data scientists Data scientists Data engineers Development team QA team Data scientists Business side