ML at Udaipur

Machine Learning at Udaipur Krunal Kapadiya @krunal3kapadiya

Agenda - Introduction to Machine Learning - Steps to towards
into ML - Problems in Machine Learning Data - Ending note

Introduction to Machine Learning

What is Machine Learning Input Output It’s not Dr Strange,
It’s magic tricks

A. Classiﬁcation B. Regression C. Dimensity Reduction D. Clustering Q.
I want to predict values I will use

What is Machine Learning "A computer program is said to
learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.” - Tom. M. Mitchell For example: I need a program that will tell me which tweets will get retweets. - Task (T): Classify a tweet that has not been published as going to get retweets or not. - Experience (E): A corpus of tweets for an account where some have retweets and some do not. - Performance (P): Classiﬁcation accuracy, the number of tweets predicted correctly out of all tweets considered as a percentage.

What is Machine Learning - Data used - 2005 -
130 Exabytes - 2010 - 1200 Exabytes - 2015 - 7900 Exabytes - 2020 - 40900 Exabytes 1 EB = 1000 PB = 1 million TB = 1 billion GB 1 billion GB = 10 Thousand Crore TB

Applications of Machine Learning

Applications of Machine Learning - In computer vision - In
data predictor, stock market predictor - In data segmentations (customer segmentation) - Anomaly detection - Sentiment analysis

Types of Data

Types of Data : Structured Data

Types of Data: Unstructured Data

Machine Learning != Data Science != Data Analyst

Steps in ML

Set Goal Split Data Train Model Results Test Model Steps
in ML

Steps in ML Training and splitting data with validations 20%
Test case 80% Training set Total numbers of training set

Steps in ML Training and splitting data with validations 15%
Test set 70% Training set Total numbers of training set 15% Validation

in ML

Source: https://www.superdatascience.com/machine-learning/ Problem: Find whether user will purchased or not

in ML

Steps in ML Python from sklearn.ensemble import RandomForestClassiﬁer from sklearn.model_selection
import train_test_split Import pandas as pd pd.read_csv(‘Social_Network_Ads.csv’) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0 R library(caTools) dataset = read.csv('Social_Network_Ads.csv') split = sample.split(dataset$Purchased, SplitRatio = 0.75

A. cross_validation B. model_selection C. crossValidation D. modelSelection Q. To
train and split data we are using now

in ML

Steps in ML Python from sklearn.preprocessing import StandardScaler sc =
StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) classifier.fit(X_train, y_train) R training_set[-3] = scale(training_set[-3]) test_set[-3] = scale(test_set[-3]) library(rpart) classifier = rpart(formula = Purchased ~., data = training_set)

Q. Standard scaler can have value more than one? A.
Yes B. No

in ML

Steps in ML Python from sklearn.metrics import confusion_matrix cm =
confusion_matrix(y_test, y_pred) pickle.dumps(classifier, open(‘classifier_model_in_python.pkl’, ‘wb’)) R # Making the Confusion Matrix cm = table(test_set[, 3], y_pred) saveRDS(final_model, "classifier_model_in_R.rds")

in ML

Results

A. X B. Y Independent Variable is generally displayed in
following axis

Results

Confusion Matrix Actual True Actual False Predicted True Predicted False
False Negative False Positive

Another Problem Statement Given a Student’s High School GPA, predict
his University GPA

Find apple or orange problem Traditional approach to solve problem

Weight Texture Label 150g Bumpy Orange 170g Bumpy Orange 140g
Smooth Apple 130g Smooth Apple Feature Feature Find apple or orange problem Training Data

Weight Texture Label 150g Bumpy Orange 170g Bumpy Orange 140g
Smooth Apple 130g Smooth Apple Feature Feature Examples Find apple or orange problem Training Data

Orange Apple Weight = 150 G Yes No Yes No
Texture = bumpy ? ... Decision Tree Find apple or orange problem

An Algorithm to Identify Numbers

Google IO 2019 Coral Source: https://www.youtube.com/watch?v=Jgm25QdF90A

Problems in Machine Learning Data

But, real problems are... • Insufficient quantity of training data
• Non representative training data • Poor quality data • Irrelevant features • Overfitting • Underfitting

A. Insufficient Data B. Poor quality of data C. Overfitting
D. Underfitting Q. Challenges in ML

Ending note

How can I start it • Look at the dataset
• Write down columns and it’s correlation • Make questions derived from the dataset • Explanatory Analysis with visualization • Frame problem • Create solution by creating model

Start learning by own • https://developers.google.com/machine-learning/crash-course/ml-intro • https://www.kaggle.com/learn/overview • https://www.tensorﬂow.org/tutorials/
• https://pandas.pydata.org/pandas-docs/stable/10min.html

That’s it

Thank You @krunal3kapadiya https://krunal3kapadiya.app

ML at Udaipur

ML at Udaipur

More Decks by Krunal Kapadiya

Other Decks in Technology

Featured

Transcript