Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Feature scaling and when to use it

uday kiran
January 06, 2021

Feature scaling and when to use it

uday kiran

January 06, 2021
Tweet

More Decks by uday kiran

Other Decks in Education

Transcript

  1. Feature scaling

    View Slide

  2. What is feature scaling?
    • It is one of the most fundamental steps in the part of data processing.
    • Goal of this technique is to make sure all features are on almost the
    same scale.
    • Because most of the times our dataset will contain features highly
    varying in magnitudes, units and range.

    View Slide

  3. Why feature scaling?
    To give equal importance to each feature.
    This technique also makes it easy for ML
    algorithms to process the data.
    To solve issues faced by different
    algorithms.
    Helps algorithms to train and
    converge faster.

    View Slide

  4. Problems with Gradient Descent Based Algorithms
    • Look at the picture
    • Differnet scale features will affect the step size of a gradient descent.
    • Scaling features can help the gradient descent converge more quickly towards the minima.
    • Example algorithms like linear regression, logistic regression, neural network, etc.

    View Slide

  5. Problems with Distance-Based Algorithms
    • These algorithms are biased towards large values.
    • Because we calculate the distance between the data points to determine their similarity.
    • This will give different weights to different features.
    • By scaling the features, we can conclude that all of them contribute equally to the result.
    • Example algorithms like KNN, K-means, and SVM

    View Slide

  6. When it is not important?
    • If an algorithm is not distance-based, feature scaling is unimportant, including Naive Bayes, Linear Discriminant Analysis,
    and Tree-Based models (gradient boosting, random forest, etc.).

    View Slide

  7. • Min-Max scaling.
    • Which is used to transform all features to be on similar scale (they range to [0,1]).
    • This is affected by outliears.
    • Maximum value will be 1 and minimum will be 0
    Normalization

    View Slide

  8. When to use it?
    • We can use normalization when we don’t know about the distribution or it not follow a Gaussian distribution.
    • It is useful when the algorithms doesn't assume about the data distribution.
    • Images
    • Where you need data to be between 0 and 1

    View Slide

  9. Standardization
    • We also call it as Z-Score Normalization.
    • Which is used to transform features by subtracting from mean and dividing by standard deviation.
    • So, all the values are centered around the mean with a unit SD.
    • Each feature will be of differnet scale.
    • It can handle outliers

    View Slide

  10. When to use it?
    • When we want to ensure zero mean and unit standard deviation.
    • If the feature distribution is normal or Gaussian, we can use standardisation (does not have to be necessarily true)

    View Slide

  11. Demo time
    Find the code here

    View Slide