Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Things that you should know in ml

Things that you should know in ml

This is presentation is to understand the basic terms that are used in ML. I covered the most frequently used topics.
#UFest18 #IndiaMLCC #UFestIndia2018

B1a1cc3d71600c6e47c33c65fa08f71f?s=128

Krunal Kapadiya

October 21, 2018
Tweet

More Decks by Krunal Kapadiya

Other Decks in Technology

Transcript

  1. Krunal Kapadiya Engineer @ @krunal3kapadiya #IndiaMLCC #UFest18 1

  2. 2

  3. Input Output Magic What is Machine Learning? 3

  4. Types of machine learning Based on Human supervision Supervised ML

    Unsupervised ML Semi supervised Reinforcement learning 4
  5. Types of machine learning Based on learning Online learning Offline

    learning Based on Data patterns Instance based Model based 5
  6. Measures of central tendency What is Mean? - Average value

    from dataset What is Median? - Middle value of dataset What is Mode? - Repeated value of dataset http://bit.ly/MeasureCentralTendency 6
  7. Skewness Remember Negative Skew: Mean is less than mode Positive

    Skew: Mean is greater than mode 7
  8. Data scientist ≠ Data Engineer Machine learning ≠ Data scientist

    8
  9. Training set and Testing set 9

  10. 10

  11. 11

  12. 12

  13. 13

  14. “A” - Auto regression - Time series regression model -

    Activation Function - Sigmoid - Use in binary classification - ReLU - Helps in hidden layers - Softmax - Mostly used in multiclass classification - A/B Testing - Which technique perform better - Accuracy - correctly predicted values 14
  15. When you learn ML in 24 hours 15

  16. “B” - Bagging - multiple models and final prediction combining

    all predictions - Box plot - displays range of variations in data - Backpropagation - update weights reduce errors - Batches - small chunks and splitted data - Batch normalization - improve performance and stability of DNN 16
  17. “C” - CNN - Convolutional Neural network - Classification -

    Labels are known - Cost function - cost function minimum, models accuracy best - Confusion Matrix - displays performance of the model 17
  18. “D” - Dropout - Hidden layer dropped to prevent overfitting

    - Data Augmentation 18
  19. “E” - Eager execution - operations runs immediately, waiting for

    graph execution - Epochs - single training iteration - Early stopping - prevent overfitting “F” - Forward propagation - only one way input to output, no backward 19
  20. “G” - Gradient Descendant - Batch GD - Stochastic Gradient

    Descendant - Mini batch GD 20
  21. “G”eneralization (a.k.a out of sample error) - Measure of accuracy

    for previously unseen data - Difference between expected and proven error - Mostly occurs in deep learning model, training sets working fine, but not fitting in real data 21
  22. “H” - Hyperparameters - values set before training model, e.g.

    batch size, number of tree - Histogram - use to determine skewness “I” - Imputation - wrangling data, filling missing values 22
  23. “L” - Learning rate - amount of minimizing in cost

    function - LSTM - building units in RNN, speech pred, rhythm learning “M” - MLP (Multilevel perception) - aka fully connected layers 23
  24. “N” - Numpy - Linspace - Random - Array -

    Arange “O” - Outliers - value that far away from dataset pattern 24
  25. “P” - Pandas - Dataframes - Series - Pooling -

    use to reduce parameters and prevent overfitting “R” - Regression - predicting values, typically in floating points 25
  26. Pooling 26

  27. Validating model Confusion Matrix - If false negatives are ok,

    requires high precision, e.g. Spam filter - If false positives are ok, requires high recall, e.g. Medical Diagnosis Precision Recall F-1 Score Accuracy Accuracy = Ratio of correctly classified points / total points 27
  28. Let’s Go For It 1. Look at the dataset 2.

    Write down columns and it’s correlation 3. Make questions derived from the dataset 4. Explanatory Analysis with visualization 5. Frame problem 6. Create solution by creating model 28
  29. Explanatory Analysis - Look at the rows and tables -

    Find correlated columns - Display it in charts - Give summary based on graph 29
  30. TMDB Notebook (Dataset) 30

  31. Reference https://www.analyticsvidhya.com/blog/2017/05/25-must-know-terms-concepts-for-begi nners-in-deep-learning/ https://ml-cheatsheet.readthedocs.io 31

  32. 32

  33. Thank You 33 https://krunal3kapadiya.app/ @krunal3kapadiya #IndiaMLCC #Ufest18