Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Creative Art of Engineering Bespoke Features

Creative Art of Engineering Bespoke Features

The Creative Art of Engineering Bespoke Features - Wale Akinfaderin - ML4ALL 2019
Youtube: https://www.youtube.com/watch?v=ZQ5wF7z01I0&t=565s&frags=pl%2Cwn

Wale Akinfaderin

April 29, 2019
Tweet

More Decks by Wale Akinfaderin

Other Decks in Technology

Transcript

  1. The Creative Art of Engineering Bespoke Features 2019 ML4ALL Conference,

    Portland, OR Wale Akinfaderin {@waleakinfaderin} April 29, 2019 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 1 / 39
  2. Talk Outline 1 About Me 2 Introduction 3 Categorical Features

    4 Numerical Features 5 Datatime and Cordinates 6 Other Techniques Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 2 / 39
  3. Talk Outline 1 About Me 2 Introduction 3 Categorical Features

    4 Numerical Features 5 Datatime and Cordinates 6 Other Techniques Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 3 / 39
  4. About Me Data Scientist for a giant retail company. Physicist

    with PhD research in Magnetic Resonance Spectroscopy. Ex Insight Data Science Fellow. Ex Researcher at IBM Research doing ML for development. Interested in democratizing ML. I really like soccer and scrabble. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 4 / 39
  5. Talk Outline 1 About Me 2 Introduction 3 Categorical Features

    4 Numerical Features 5 Datatime and Cordinates 6 Other Techniques Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 5 / 39
  6. ”Applied machine learning is basically feature engineering” - Andrew Ng

    ”More data beats clever algorithms, but better data beats more data” - Peter Norvig Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 6 / 39
  7. Introduction What is feature engineering? The process of introducing knowledge

    into a machine learning model. What are features? Variables found in a given problem set that can sufficiently help in building an accurate predictive model. They represent the knowledge appropriate for machine learning algorithms. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 7 / 39
  8. Why Feature Engineering? Engineering good features will allow you to

    most accurately represent the underlying structure of the data and therefore create the best model. Features in your data will influence the results that your predictive model can achieve. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 9 / 39
  9. Limits of Feature Engineering Makes the degrees of freedom thin

    on relatively small datasets Including too much correlated variables can decreae the model performance (large p vs n). More variables can make the model less interpretable (p vs large n). Interpretability-Accuracy tradeoff. Generalizability of models to other data. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 10 / 39
  10. Is Feature Engineering really Art? Creative Processes in Art 1

    Plan and practice 2 Begin to create and improvise 3 Review and revise 4 Refine your thinking by interpreting/explaining 5 Share and reflect Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 11 / 39
  11. The Creative Process of Engineering Features 1 Define your problems

    and plan (prediction, recommendation) 2 Thorough understanding of your data and model (EDA, model assumptions) 3 Plan your feature engineering goals (speed, performance) 4 Refine your thinking by interpreting/explaining (test, iterate) Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 12 / 39
  12. Talk Outline 1 About Me 2 Introduction 3 Categorical Features

    4 Numerical Features 5 Datatime and Cordinates 6 Other Techniques Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 13 / 39
  13. Categorical Features Importance 1 They mostly need to be engineered.

    2 Encoding categorical features can lead to sparse data if the cardinality is high. 3 So difficult to impute missing data because they easily lead to an over- or under-estimation of variability. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 14 / 39
  14. One Hot Encoding 1 Most common method used to represent

    nominal categorical features. 2 Easy to explain and produces easily intelligible results. car id car manufacturer car mpg 1 Honda 20.2 2 Toyota 25.3 3 Toyota 23.2 4 Ford 19.6 5 Mercedes 16.8 Table: Car table Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 15 / 39
  15. One Hot Encoding car id Honda Toyota Ford Mercedes car

    mpg 1 1 0 0 0 20.2 2 0 1 0 0 25.3 3 0 1 0 0 23.2 4 0 0 1 0 19.6 5 0 0 0 1 16.8 Table: Car table with One Hot Encoding Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 16 / 39
  16. One Hot Encoding Some Draw Backs 1 Too many features,

    which will make it computationally expensive to train complex ML algorithms. 2 Limited to linear models (or non-tree based methods). 3 Requires larger memory because of sparsity. 4 Multicollinearity. 5 implementations doesn’t treat missing variables. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 17 / 39
  17. Label Encoding 1 Assigns a numerical value for each level

    of category. 2 Assigns natural ordering between labels. 3 Good for tree-based and non-tree based methods. 4 Dimensionality remains constant. car id feature feature label car mpg 1 Honda 1 20.2 2 Toyota 2 25.3 3 Toyota 2 23.2 4 Ford 3 19.6 5 Mercedes 4 16.8 Table: Car table with label encoding Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 18 / 39
  18. Count Encoding 1 Change categorical variables with the number of

    counts of each variables. 2 Can be used for both tree and non-tree based methods. 3 It is sensitive to outliers. 4 Sometimes accounts for unwarranted interaction. If the count is skewed, a log transformation can be done. car id feature feature count car mpg 1 Honda 1 20.2 2 Toyota 2 25.3 3 Toyota 2 23.2 4 Ford 1 19.6 5 Mercedes 1 16.8 Table: Car table with count encoding Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 19 / 39
  19. Label Count Encoding 1 Count the categorical variables and rank

    by number of counts. 2 Good for both tree and non-tree based methods. car id feature feature labelcount car mpg 1 Honda 2 20.2 2 Toyota 1 25.3 3 Toyota 1 23.2 4 Ford 3 19.6 5 Mercedes 3 16.8 6 Toyota 1 20.9 7 Honda 2 29.3 8 Toyota 1 26.7 Table: Car table with label count encoding Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 20 / 39
  20. Target Encoding 1 In target encoding, the categorical variables are

    encoded using the ratio of their target count to their variable count. 2 Tricky because of a possible correlation between the encoded feature and the target. 3 Good for large number of features. 4 Top Kagglers best kept secret (with addition of random noise to combat overfitting). Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 21 / 39
  21. Target Encoding car id feature feature count feature target car

    mpg target 1 Honda 3 0.33 20.2 0 2 Honda 3 0.33 25.3 1 3 Honda 3 0.33 23.2 0 4 Toyota 5 0.60 16.8 0 5 Toyota 5 0.60 16.8 1 6 Toyota 5 0.60 16.8 1 7 Toyota 5 0.60 16.8 0 8 Toyota 5 0.60 16.8 1 Table: Target Encoding for Binary Classification Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 22 / 39
  22. Talk Outline 1 About Me 2 Introduction 3 Categorical Features

    4 Numerical Features 5 Datatime and Cordinates 6 Other Techniques Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 23 / 39
  23. Handling Numerical Features Binning 1 Converting a numerical variable to

    discrete variables. 2 Conversion is done by declaring a range of values. Types of Binning 1 Quantile based binning 2 Fixed width binning Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 24 / 39
  24. Rounding Ability to keep the important features of the data.

    Rounding can help remove random noise. Can also form categorical variables. mpg mpg R1 mpg R2 23.2 23 2 24.4 24 2 16.0 16 1 32.3 32 3 29.1 29 2 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 25 / 39
  25. Scaling and Normalization Min-Max Scaling Scale values towards the mean

    of the column. z = y − min(y) max(y) − min(y) Z-score Scaling Good for not losing the impact of outliers. z = y − µ σ µ is the mean and σ is the standard deviation. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 26 / 39
  26. Logarithmic Scaling Calculates the log of the values to narrow

    the range It changes the distribution and helps to improve performace of linear models z = log y Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 27 / 39
  27. Label Engineering and Transformation Box-Cox Transformation Statistical transformation for removing

    the heteroscedacity of a feature. Makes it normally distributed. Performs transformation to find lambda (λ) value. The Box-Cox becomes the Logarithmic transformation for λ = 0. Useful in for selecting a transformation for linearity or normality. y(λ) = yλ − 1 /λ if λ = 0 log(y) if λ = 0 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 28 / 39
  28. Yeo-Johnson Transformation Box-Cox is restricted by the value of y.

    Yeo-Johnson is a new family of transformation that can be used without placing any positive value restriction on y ψ(λ, y) =        (y + 1)λ − 1 /λ if λ = 0, y ≥ 0 log(y + 1) if λ = 0, y ≥ 0 − (−y + 1)2−λ − 1 ]/(2 − λ) if λ = 2, y < 0 − log(−y + 1) if λ = 2, y < 0 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 30 / 39
  29. Talk Outline 1 About Me 2 Introduction 3 Categorical Features

    4 Numerical Features 5 Datatime and Cordinates 6 Other Techniques Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 31 / 39
  30. Date and Time Periodicity Day number in week, month, year,

    season Time since particular event Number left until recent holiday Or just a random date Difference between dates datetime feature1 - datetime feature2 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 32 / 39
  31. Cordinates Center of clusters Distance to the nearest point of

    reference or major hub Aggregated statistics (for a particular area) For tree based methods, including a feature that describes the rotation of the longitude and latitude is useful. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 33 / 39
  32. Talk Outline 1 About Me 2 Introduction 3 Categorical Features

    4 Numerical Features 5 Datatime and Cordinates 6 Other Techniques Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 35 / 39
  33. Encoding Nonlinearity Feature Crosses An artificial feature where we can

    apply two or more input features together to encode nonlinearity. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 36 / 39
  34. Feature Crosses For example, we can create different features from

    x1 and x2 to create new features. x3 = x1x2 x4 = x2 1 Newly created features will be added to the linear model formula: y = b + w1x1 + w2x2 + w3x3 + w4x4 Computationally expensive but can be trained efficiently using Stochastic Gradient Descent (SGD). Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 37 / 39
  35. The Creative Art of Engineering Bespoke Features 2019 ML4ALL Conference,

    Portland, OR Wale Akinfaderin {@waleakinfaderin} April 29, 2019 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 39 / 39