Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML at Udaipur

ML at Udaipur

Machine Learning introduction and glossaries at GDG Udaipur

B1a1cc3d71600c6e47c33c65fa08f71f?s=128

Krunal Kapadiya

July 06, 2019
Tweet

More Decks by Krunal Kapadiya

Other Decks in Technology

Transcript

  1. Machine Learning at Udaipur Krunal Kapadiya @krunal3kapadiya

  2. Agenda - Introduction to Machine Learning - Steps to towards

    into ML - Problems in Machine Learning Data - Ending note
  3. None
  4. Introduction to Machine Learning

  5. What is Machine Learning Input Output It’s not Dr Strange,

    It’s magic tricks
  6. None
  7. A. Classification B. Regression C. Dimensity Reduction D. Clustering Q.

    I want to predict values I will use
  8. A. Classification B. Regression C. Dimensity Reduction D. Clustering Q.

    I want to predict values I will use
  9. What is Machine Learning "A computer program is said to

    learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.” - Tom. M. Mitchell For example: I need a program that will tell me which tweets will get retweets. - Task (T): Classify a tweet that has not been published as going to get retweets or not. - Experience (E): A corpus of tweets for an account where some have retweets and some do not. - Performance (P): Classification accuracy, the number of tweets predicted correctly out of all tweets considered as a percentage.
  10. What is Machine Learning - Data used - 2005 -

    130 Exabytes - 2010 - 1200 Exabytes - 2015 - 7900 Exabytes - 2020 - 40900 Exabytes 1 EB = 1000 PB = 1 million TB = 1 billion GB 1 billion GB = 10 Thousand Crore TB
  11. Applications of Machine Learning

  12. Applications of Machine Learning - In computer vision - In

    data predictor, stock market predictor - In data segmentations (customer segmentation) - Anomaly detection - Sentiment analysis
  13. None
  14. Types of Data

  15. Types of Data : Structured Data

  16. Types of Data: Unstructured Data

  17. Machine Learning != Data Science != Data Analyst

  18. Machine Learning != Data Science != Data Analyst

  19. Steps in ML

  20. Set Goal Split Data Train Model Results Test Model Steps

    in ML
  21. Steps in ML Training and splitting data with validations 20%

    Test case 80% Training set Total numbers of training set
  22. Steps in ML Training and splitting data with validations 15%

    Test set 70% Training set Total numbers of training set 15% Validation
  23. Set Goal Split Data Train Model Results Test Model Steps

    in ML
  24. Source: https://www.superdatascience.com/machine-learning/ Problem: Find whether user will purchased or not

  25. Set Goal Split Data Train Model Results Test Model Steps

    in ML
  26. Set Goal Split Data Train Model Results Test Model Steps

    in ML
  27. Steps in ML Python from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection

    import train_test_split Import pandas as pd pd.read_csv(‘Social_Network_Ads.csv’) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0 R library(caTools) dataset = read.csv('Social_Network_Ads.csv') split = sample.split(dataset$Purchased, SplitRatio = 0.75
  28. A. cross_validation B. model_selection C. crossValidation D. modelSelection Q. To

    train and split data we are using now
  29. A. cross_validation B. model_selection C. crossValidation D. modelSelection Q. To

    train and split data we are using now
  30. Set Goal Split Data Train Model Results Test Model Steps

    in ML
  31. Steps in ML Python from sklearn.preprocessing import StandardScaler sc =

    StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) classifier.fit(X_train, y_train) R training_set[-3] = scale(training_set[-3]) test_set[-3] = scale(test_set[-3]) library(rpart) classifier = rpart(formula = Purchased ~., data = training_set)
  32. Q. Standard scaler can have value more than one? A.

    Yes B. No
  33. Q. Standard scaler can have value more than one? A.

    Yes B. No
  34. Set Goal Split Data Train Model Results Test Model Steps

    in ML
  35. Steps in ML Python from sklearn.metrics import confusion_matrix cm =

    confusion_matrix(y_test, y_pred) pickle.dumps(classifier, open(‘classifier_model_in_python.pkl’, ‘wb’)) R # Making the Confusion Matrix cm = table(test_set[, 3], y_pred) saveRDS(final_model, "classifier_model_in_R.rds")
  36. Set Goal Split Data Train Model Results Test Model Steps

    in ML
  37. Results

  38. A. X B. Y Independent Variable is generally displayed in

    following axis
  39. A. X B. Y Independent Variable is generally displayed in

    following axis
  40. Results

  41. Confusion Matrix Actual True Actual False Predicted True Predicted False

    False Negative False Positive
  42. Another Problem Statement Given a Student’s High School GPA, predict

    his University GPA
  43. Find apple or orange problem Traditional approach to solve problem

  44. Weight Texture Label 150g Bumpy Orange 170g Bumpy Orange 140g

    Smooth Apple 130g Smooth Apple Feature Feature Find apple or orange problem Training Data
  45. Weight Texture Label 150g Bumpy Orange 170g Bumpy Orange 140g

    Smooth Apple 130g Smooth Apple Feature Feature Examples Find apple or orange problem Training Data
  46. Orange Apple Weight = 150 G Yes No Yes No

    Texture = bumpy ? ... Decision Tree Find apple or orange problem
  47. An Algorithm to Identify Numbers

  48. An Algorithm to Identify Numbers

  49. An Algorithm to Identify Numbers

  50. An Algorithm to Identify Numbers

  51. Google IO 2019 Coral Source: https://www.youtube.com/watch?v=Jgm25QdF90A

  52. Problems in Machine Learning Data

  53. But, real problems are... • Insufficient quantity of training data

    • Non representative training data • Poor quality data • Irrelevant features • Overfitting • Underfitting
  54. A. Insufficient Data B. Poor quality of data C. Overfitting

    D. Underfitting Q. Challenges in ML
  55. A. Insufficient Data B. Poor quality of data C. Overfitting

    D. Underfitting Q. Challenges in ML
  56. Ending note

  57. How can I start it • Look at the dataset

    • Write down columns and it’s correlation • Make questions derived from the dataset • Explanatory Analysis with visualization • Frame problem • Create solution by creating model
  58. Start learning by own • https://developers.google.com/machine-learning/crash-course/ml-intro • https://www.kaggle.com/learn/overview • https://www.tensorflow.org/tutorials/

    • https://pandas.pydata.org/pandas-docs/stable/10min.html
  59. That’s it

  60. Thank You @krunal3kapadiya https://krunal3kapadiya.app