This is presentation is to understand the basic terms that are used in ML. I covered the most frequently used topics.
#UFest18 #IndiaMLCC #UFestIndia2018
Measures of central tendency What is Mean? - Average value from dataset What is Median? - Middle value of dataset What is Mode? - Repeated value of dataset http://bit.ly/MeasureCentralTendency 6
“A” - Auto regression - Time series regression model - Activation Function - Sigmoid - Use in binary classification - ReLU - Helps in hidden layers - Softmax - Mostly used in multiclass classification - A/B Testing - Which technique perform better - Accuracy - correctly predicted values 14
“B” - Bagging - multiple models and final prediction combining all predictions - Box plot - displays range of variations in data - Backpropagation - update weights reduce errors - Batches - small chunks and splitted data - Batch normalization - improve performance and stability of DNN 16
“C” - CNN - Convolutional Neural network - Classification - Labels are known - Cost function - cost function minimum, models accuracy best - Confusion Matrix - displays performance of the model 17
“E” - Eager execution - operations runs immediately, waiting for graph execution - Epochs - single training iteration - Early stopping - prevent overfitting “F” - Forward propagation - only one way input to output, no backward 19
“G”eneralization (a.k.a out of sample error) - Measure of accuracy for previously unseen data - Difference between expected and proven error - Mostly occurs in deep learning model, training sets working fine, but not fitting in real data 21
“H” - Hyperparameters - values set before training model, e.g. batch size, number of tree - Histogram - use to determine skewness “I” - Imputation - wrangling data, filling missing values 22
“P” - Pandas - Dataframes - Series - Pooling - use to reduce parameters and prevent overfitting “R” - Regression - predicting values, typically in floating points 25
Validating model Confusion Matrix - If false negatives are ok, requires high precision, e.g. Spam filter - If false positives are ok, requires high recall, e.g. Medical Diagnosis Precision Recall F-1 Score Accuracy Accuracy = Ratio of correctly classified points / total points 27
Let’s Go For It 1. Look at the dataset 2. Write down columns and it’s correlation 3. Make questions derived from the dataset 4. Explanatory Analysis with visualization 5. Frame problem 6. Create solution by creating model 28