Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Things that you should know in ml

Things that you should know in ml

This is presentation is to understand the basic terms that are used in ML. I covered the most frequently used topics.
#UFest18 #IndiaMLCC #UFestIndia2018

Krunal Kapadiya

October 21, 2018
Tweet

More Decks by Krunal Kapadiya

Other Decks in Technology

Transcript

  1. 2

  2. Types of machine learning Based on Human supervision Supervised ML

    Unsupervised ML Semi supervised Reinforcement learning 4
  3. Types of machine learning Based on learning Online learning Offline

    learning Based on Data patterns Instance based Model based 5
  4. Measures of central tendency What is Mean? - Average value

    from dataset What is Median? - Middle value of dataset What is Mode? - Repeated value of dataset http://bit.ly/MeasureCentralTendency 6
  5. 10

  6. 11

  7. 12

  8. 13

  9. “A” - Auto regression - Time series regression model -

    Activation Function - Sigmoid - Use in binary classification - ReLU - Helps in hidden layers - Softmax - Mostly used in multiclass classification - A/B Testing - Which technique perform better - Accuracy - correctly predicted values 14
  10. “B” - Bagging - multiple models and final prediction combining

    all predictions - Box plot - displays range of variations in data - Backpropagation - update weights reduce errors - Batches - small chunks and splitted data - Batch normalization - improve performance and stability of DNN 16
  11. “C” - CNN - Convolutional Neural network - Classification -

    Labels are known - Cost function - cost function minimum, models accuracy best - Confusion Matrix - displays performance of the model 17
  12. “E” - Eager execution - operations runs immediately, waiting for

    graph execution - Epochs - single training iteration - Early stopping - prevent overfitting “F” - Forward propagation - only one way input to output, no backward 19
  13. “G”eneralization (a.k.a out of sample error) - Measure of accuracy

    for previously unseen data - Difference between expected and proven error - Mostly occurs in deep learning model, training sets working fine, but not fitting in real data 21
  14. “H” - Hyperparameters - values set before training model, e.g.

    batch size, number of tree - Histogram - use to determine skewness “I” - Imputation - wrangling data, filling missing values 22
  15. “L” - Learning rate - amount of minimizing in cost

    function - LSTM - building units in RNN, speech pred, rhythm learning “M” - MLP (Multilevel perception) - aka fully connected layers 23
  16. “N” - Numpy - Linspace - Random - Array -

    Arange “O” - Outliers - value that far away from dataset pattern 24
  17. “P” - Pandas - Dataframes - Series - Pooling -

    use to reduce parameters and prevent overfitting “R” - Regression - predicting values, typically in floating points 25
  18. Validating model Confusion Matrix - If false negatives are ok,

    requires high precision, e.g. Spam filter - If false positives are ok, requires high recall, e.g. Medical Diagnosis Precision Recall F-1 Score Accuracy Accuracy = Ratio of correctly classified points / total points 27
  19. Let’s Go For It 1. Look at the dataset 2.

    Write down columns and it’s correlation 3. Make questions derived from the dataset 4. Explanatory Analysis with visualization 5. Frame problem 6. Create solution by creating model 28
  20. Explanatory Analysis - Look at the rows and tables -

    Find correlated columns - Display it in charts - Give summary based on graph 29
  21. 32