Creative Art of Engineering Bespoke Features

The Creative Art of Engineering Bespoke Features 2019 ML4ALL Conference,
Portland, OR Wale Akinfaderin {@waleakinfaderin} April 29, 2019 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 1 / 39

Talk Outline 1 About Me 2 Introduction 3 Categorical Features
4 Numerical Features 5 Datatime and Cordinates 6 Other Techniques Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 2 / 39

About Me Data Scientist for a giant retail company. Physicist
with PhD research in Magnetic Resonance Spectroscopy. Ex Insight Data Science Fellow. Ex Researcher at IBM Research doing ML for development. Interested in democratizing ML. I really like soccer and scrabble. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 4 / 39

”Applied machine learning is basically feature engineering” - Andrew Ng
”More data beats clever algorithms, but better data beats more data” - Peter Norvig Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 6 / 39

Introduction What is feature engineering? The process of introducing knowledge
into a machine learning model. What are features? Variables found in a given problem set that can suﬃciently help in building an accurate predictive model. They represent the knowledge appropriate for machine learning algorithms. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 7 / 39

Machine Learning? Wale, A. (Data Scientist & Physicist) @ML4ALL Portland,
OR April 29, 2019 8 / 39

Why Feature Engineering? Engineering good features will allow you to
most accurately represent the underlying structure of the data and therefore create the best model. Features in your data will inﬂuence the results that your predictive model can achieve. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 9 / 39

Limits of Feature Engineering Makes the degrees of freedom thin
on relatively small datasets Including too much correlated variables can decreae the model performance (large p vs n). More variables can make the model less interpretable (p vs large n). Interpretability-Accuracy tradeoﬀ. Generalizability of models to other data. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 10 / 39

Is Feature Engineering really Art? Creative Processes in Art 1
Plan and practice 2 Begin to create and improvise 3 Review and revise 4 Reﬁne your thinking by interpreting/explaining 5 Share and reﬂect Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 11 / 39

The Creative Process of Engineering Features 1 Deﬁne your problems
and plan (prediction, recommendation) 2 Thorough understanding of your data and model (EDA, model assumptions) 3 Plan your feature engineering goals (speed, performance) 4 Reﬁne your thinking by interpreting/explaining (test, iterate) Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 12 / 39

Categorical Features Importance 1 They mostly need to be engineered.
2 Encoding categorical features can lead to sparse data if the cardinality is high. 3 So diﬃcult to impute missing data because they easily lead to an over- or under-estimation of variability. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 14 / 39

One Hot Encoding 1 Most common method used to represent
nominal categorical features. 2 Easy to explain and produces easily intelligible results. car id car manufacturer car mpg 1 Honda 20.2 2 Toyota 25.3 3 Toyota 23.2 4 Ford 19.6 5 Mercedes 16.8 Table: Car table Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 15 / 39

One Hot Encoding car id Honda Toyota Ford Mercedes car
mpg 1 1 0 0 0 20.2 2 0 1 0 0 25.3 3 0 1 0 0 23.2 4 0 0 1 0 19.6 5 0 0 0 1 16.8 Table: Car table with One Hot Encoding Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 16 / 39

One Hot Encoding Some Draw Backs 1 Too many features,
which will make it computationally expensive to train complex ML algorithms. 2 Limited to linear models (or non-tree based methods). 3 Requires larger memory because of sparsity. 4 Multicollinearity. 5 implementations doesn’t treat missing variables. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 17 / 39

Label Encoding 1 Assigns a numerical value for each level
of category. 2 Assigns natural ordering between labels. 3 Good for tree-based and non-tree based methods. 4 Dimensionality remains constant. car id feature feature label car mpg 1 Honda 1 20.2 2 Toyota 2 25.3 3 Toyota 2 23.2 4 Ford 3 19.6 5 Mercedes 4 16.8 Table: Car table with label encoding Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 18 / 39

Count Encoding 1 Change categorical variables with the number of
counts of each variables. 2 Can be used for both tree and non-tree based methods. 3 It is sensitive to outliers. 4 Sometimes accounts for unwarranted interaction. If the count is skewed, a log transformation can be done. car id feature feature count car mpg 1 Honda 1 20.2 2 Toyota 2 25.3 3 Toyota 2 23.2 4 Ford 1 19.6 5 Mercedes 1 16.8 Table: Car table with count encoding Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 19 / 39

Label Count Encoding 1 Count the categorical variables and rank
by number of counts. 2 Good for both tree and non-tree based methods. car id feature feature labelcount car mpg 1 Honda 2 20.2 2 Toyota 1 25.3 3 Toyota 1 23.2 4 Ford 3 19.6 5 Mercedes 3 16.8 6 Toyota 1 20.9 7 Honda 2 29.3 8 Toyota 1 26.7 Table: Car table with label count encoding Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 20 / 39

Target Encoding 1 In target encoding, the categorical variables are
encoded using the ratio of their target count to their variable count. 2 Tricky because of a possible correlation between the encoded feature and the target. 3 Good for large number of features. 4 Top Kagglers best kept secret (with addition of random noise to combat overﬁtting). Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 21 / 39

Target Encoding car id feature feature count feature target car
mpg target 1 Honda 3 0.33 20.2 0 2 Honda 3 0.33 25.3 1 3 Honda 3 0.33 23.2 0 4 Toyota 5 0.60 16.8 0 5 Toyota 5 0.60 16.8 1 6 Toyota 5 0.60 16.8 1 7 Toyota 5 0.60 16.8 0 8 Toyota 5 0.60 16.8 1 Table: Target Encoding for Binary Classiﬁcation Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 22 / 39

Handling Numerical Features Binning 1 Converting a numerical variable to
discrete variables. 2 Conversion is done by declaring a range of values. Types of Binning 1 Quantile based binning 2 Fixed width binning Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 24 / 39

Rounding Ability to keep the important features of the data.
Rounding can help remove random noise. Can also form categorical variables. mpg mpg R1 mpg R2 23.2 23 2 24.4 24 2 16.0 16 1 32.3 32 3 29.1 29 2 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 25 / 39

Scaling and Normalization Min-Max Scaling Scale values towards the mean
of the column. z = y − min(y) max(y) − min(y) Z-score Scaling Good for not losing the impact of outliers. z = y − µ σ µ is the mean and σ is the standard deviation. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 26 / 39

Logarithmic Scaling Calculates the log of the values to narrow
the range It changes the distribution and helps to improve performace of linear models z = log y Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 27 / 39

Label Engineering and Transformation Box-Cox Transformation Statistical transformation for removing
the heteroscedacity of a feature. Makes it normally distributed. Performs transformation to ﬁnd lambda (λ) value. The Box-Cox becomes the Logarithmic transformation for λ = 0. Useful in for selecting a transformation for linearity or normality. y(λ) = yλ − 1 /λ if λ = 0 log(y) if λ = 0 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 28 / 39

Box-Cox Transformation Wale, A. (Data Scientist & Physicist) @ML4ALL Portland,
OR April 29, 2019 29 / 39

Yeo-Johnson Transformation Box-Cox is restricted by the value of y.
Yeo-Johnson is a new family of transformation that can be used without placing any positive value restriction on y ψ(λ, y) =        (y + 1)λ − 1 /λ if λ = 0, y ≥ 0 log(y + 1) if λ = 0, y ≥ 0 − (−y + 1)2−λ − 1 ]/(2 − λ) if λ = 2, y < 0 − log(−y + 1) if λ = 2, y < 0 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 30 / 39

Date and Time Periodicity Day number in week, month, year,
season Time since particular event Number left until recent holiday Or just a random date Diﬀerence between dates datetime feature1 - datetime feature2 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 32 / 39

Cordinates Center of clusters Distance to the nearest point of
reference or major hub Aggregated statistics (for a particular area) For tree based methods, including a feature that describes the rotation of the longitude and latitude is useful. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 33 / 39

Coordinates Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR
April 29, 2019 34 / 39

Encoding Nonlinearity Feature Crosses An artiﬁcial feature where we can
apply two or more input features together to encode nonlinearity. Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 36 / 39

Feature Crosses For example, we can create diﬀerent features from
x1 and x2 to create new features. x3 = x1x2 x4 = x2 1 Newly created features will be added to the linear model formula: y = b + w1x1 + w2x2 + w3x3 + w4x4 Computationally expensive but can be trained eﬃciently using Stochastic Gradient Descent (SGD). Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 37 / 39

Good Reference Wale, A. (Data Scientist & Physicist) @ML4ALL Portland,
OR April 29, 2019 38 / 39

The Creative Art of Engineering Bespoke Features 2019 ML4ALL Conference,
Portland, OR Wale Akinfaderin {@waleakinfaderin} April 29, 2019 Wale, A. (Data Scientist & Physicist) @ML4ALL Portland, OR April 29, 2019 39 / 39

Creative Art of Engineering Bespoke Features

Creative Art of Engineering Bespoke Features

More Decks by Wale Akinfaderin

Other Decks in Technology

Featured

Transcript