Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Things that you should know in ml

Things that you should know in ml

This is presentation is to understand the basic terms that are used in ML. I covered the most frequently used topics.
#UFest18 #IndiaMLCC #UFestIndia2018

Krunal Kapadiya

October 21, 2018
Tweet

More Decks by Krunal Kapadiya

Other Decks in Technology

Transcript

  1. Krunal Kapadiya
    Engineer @
    @krunal3kapadiya
    #IndiaMLCC #UFest18
    1

    View Slide

  2. 2

    View Slide

  3. Input Output
    Magic
    What is Machine Learning?
    3

    View Slide

  4. Types of machine learning
    Based on Human supervision
    Supervised ML
    Unsupervised ML
    Semi supervised
    Reinforcement learning
    4

    View Slide

  5. Types of machine learning
    Based on learning
    Online learning
    Offline learning
    Based on Data patterns
    Instance based
    Model based
    5

    View Slide

  6. Measures of central tendency
    What is Mean?
    - Average value from dataset
    What is Median?
    - Middle value of dataset
    What is Mode?
    - Repeated value of dataset
    http://bit.ly/MeasureCentralTendency
    6

    View Slide

  7. Skewness
    Remember
    Negative Skew: Mean is less than mode
    Positive Skew: Mean is greater than mode
    7

    View Slide

  8. Data scientist ≠ Data Engineer
    Machine learning ≠ Data scientist
    8

    View Slide

  9. Training set and Testing set
    9

    View Slide

  10. 10

    View Slide

  11. 11

    View Slide

  12. 12

    View Slide

  13. 13

    View Slide

  14. “A”
    - Auto regression - Time series regression model
    - Activation Function
    - Sigmoid - Use in binary classification
    - ReLU - Helps in hidden layers
    - Softmax - Mostly used in multiclass classification
    - A/B Testing - Which technique perform better
    - Accuracy - correctly predicted values
    14

    View Slide

  15. When you learn ML in 24 hours
    15

    View Slide

  16. “B”
    - Bagging - multiple models and final prediction combining all predictions
    - Box plot - displays range of variations in data
    - Backpropagation - update weights reduce errors
    - Batches - small chunks and splitted data
    - Batch normalization - improve performance and stability of DNN
    16

    View Slide

  17. “C”
    - CNN - Convolutional Neural network
    - Classification - Labels are known
    - Cost function - cost function minimum, models accuracy best
    - Confusion Matrix - displays performance of the model
    17

    View Slide

  18. “D”
    - Dropout - Hidden layer dropped to prevent overfitting
    - Data Augmentation
    18

    View Slide

  19. “E”
    - Eager execution - operations runs immediately, waiting for graph execution
    - Epochs - single training iteration
    - Early stopping - prevent overfitting
    “F”
    - Forward propagation - only one way input to output, no backward
    19

    View Slide

  20. “G”
    - Gradient Descendant
    - Batch GD
    - Stochastic Gradient Descendant
    - Mini batch GD
    20

    View Slide

  21. “G”eneralization (a.k.a out of sample error)
    - Measure of accuracy for previously unseen data
    - Difference between expected and proven error
    - Mostly occurs in deep learning model, training sets working fine, but not fitting in
    real data
    21

    View Slide

  22. “H”
    - Hyperparameters - values set before training model, e.g. batch size, number of tree
    - Histogram - use to determine skewness
    “I”
    - Imputation - wrangling data, filling missing values
    22

    View Slide

  23. “L”
    - Learning rate - amount of minimizing in cost function
    - LSTM - building units in RNN, speech pred, rhythm learning
    “M”
    - MLP (Multilevel perception) - aka fully connected layers
    23

    View Slide

  24. “N”
    - Numpy
    - Linspace
    - Random
    - Array
    - Arange
    “O”
    - Outliers - value that far away from dataset pattern
    24

    View Slide

  25. “P”
    - Pandas
    - Dataframes
    - Series
    - Pooling - use to reduce parameters and prevent overfitting
    “R”
    - Regression - predicting values, typically in floating points
    25

    View Slide

  26. Pooling
    26

    View Slide

  27. Validating model
    Confusion Matrix
    - If false negatives are ok, requires high
    precision, e.g. Spam filter
    - If false positives are ok, requires high
    recall, e.g. Medical Diagnosis
    Precision Recall
    F-1 Score Accuracy
    Accuracy = Ratio of correctly classified
    points / total points
    27

    View Slide

  28. Let’s Go For It
    1. Look at the dataset
    2. Write down columns and it’s correlation
    3. Make questions derived from the dataset
    4. Explanatory Analysis with visualization
    5. Frame problem
    6. Create solution by creating model
    28

    View Slide

  29. Explanatory Analysis
    - Look at the rows and tables
    - Find correlated columns
    - Display it in charts
    - Give summary based on graph
    29

    View Slide

  30. TMDB Notebook (Dataset)
    30

    View Slide

  31. Reference
    https://www.analyticsvidhya.com/blog/2017/05/25-must-know-terms-concepts-for-begi
    nners-in-deep-learning/
    https://ml-cheatsheet.readthedocs.io
    31

    View Slide

  32. 32

    View Slide

  33. Thank You
    33
    https://krunal3kapadiya.app/
    @krunal3kapadiya
    #IndiaMLCC #Ufest18

    View Slide