Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Preparation and the Importance of How Machines Learn

Data Preparation and the Importance of How Machines Learn

Rebecca Vickery

February 05, 2020
Tweet

More Decks by Rebecca Vickery

Other Decks in Technology

Transcript

  1. Simple ML workflow Get data >> baseline model >> model

    selection >> model tuning >> predict
  2. Simple ML workflow Hyperparameter optimisation >> Best score = 1.0

    Best Params = {'max_depth': 5, 'min_samples_leaf': 1, 'min_samples_split': 10, 'n_estimators': 500}
  3. Actual ML workflow Get data >> data preparation >> feature

    engineering >> baseline model >> model selection >> model tuning >> predict
  4. Problem Source: flaticon.com 4 is bigger than 1 so there

    must be a relationship between these rows Source: flaticon.com 1 = neutered male 2 = spayed female 3 = intact male
  5. Solution: Weight of evidence For each colour (e.g. Tan): WOE

    = ln ( ( pi /p) / ( ni / n) ) pi = number of times Tan appears in positive class (1) p = total number of positive classes (1) ni = number of times Tan appears in negative class (0) n = total number of negative classes (0)
  6. “There are only two Machine Learning approaches that win competitions:

    Handcrafted & Neural Networks.” Anthony Goldbloom, CEO & Founder, Kaggle