Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine learning end to end

Machine learning end to end

Machine learning end to end presentation at GDGBaroda.
Agenda-> Introduction, central tendency, precision recall, validations.

Krunal Kapadiya

October 14, 2018
Tweet

More Decks by Krunal Kapadiya

Other Decks in Technology

Transcript

  1. Krunal Kapadiya @krunal3kapadiya #IndiaMLCC Engineer @

  2. None
  3. - Before we start - What is Machine Learning -

    Types of Machine Learning - Main challenges - Scikit learn design - Feature scaling - Validating model - Let’s go for it Agenda
  4. Before we start

  5. - Numpy (Numeric Python) - List [‘Hello’, ‘GDG’, ‘Baroda’] -

    Dictionaries {‘name’: ‘Krunal’} - Tuples (‘October’, ‘November’) - Pandas (Panel Datas) - Series pd.Series(np.random.randn(5)) - DataFrames pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})) Python libraries
  6. Measures of central tendency - Mean - Average values -

    Median - Middle values - Mode - Mostly repeated value - Standard Deviation - How data is spread in the dataset https://docs.google.com/spreadsheets/d/1KuG47YaIv8xQBEasGeNpNu24dyUT06ADj_0yvK4S67g/edit#gid=0
  7. None
  8. What is Machine Learning?

  9. What is Machine Learning? Machine learning is the sub-field of

    computer science that gives computers the ability to learn without being explicitly programmed - Arthur Samuel, 1959 - Wikipedia Machine learning is about the construction and study of systems that can learn from data. - Freebase
  10. None
  11. Why is Machine Learning Data used - 2005 - 130

    Exabytes - 2010 - 1200 Exabytes - 2015 - 7900 Exabytes - 2020 - 40900 Exabytes 1 EB = 1000 PB = 1 million TB = 1 billion GB 1 billion GB = 10 Thousand Crore TB
  12. Main challenges 1. Insufficient quantity of training data 2. Non

    representative training data 3. Poor quality data 4. Irrelevant features 5. Overfitting 6. Underfitting
  13. None
  14. Types of Machine Learning

  15. Based on Human supervision 1. Supervised - Labels are known

    2. Unsupervised - Labels are not known 3. Semi Supervised - Some labels are available some not 4. Reinforcement Learning - Continuous self learning
  16. Based on learning (learn on the fly) 1. Online learning

    - Learn incrementally 2. Batch learning - Offline learning
  17. Based on data patterns 1. Instance based learning - Machine

    learns by its sample 2. Model based learning - Created model for it
  18. Scikit learn design 1. Consistency a. Estimators fit() b. Transformers

    fit_transform() c. Predictors predict() 2. Inspection imputer.strategy (suffix ‘_’) 3. Nonproliferation of classes Numpy and Scipy sparse matrix 4. Composition pipeline, estimators 5. Sensible defaults comes with default values
  19. Feature scaling - Min max Scaling (a.k.a Normalization) Bounds values

    - Standard Scalar Outliers not much affected To know more about scaling function visit this link
  20. Validating model Confusion Matrix - If false negatives are ok,

    requires high precision, e.g. Spam filter - If false positives are ok, requires high recall, e.g. Medical Diagnosis Precision Recall F-1 Score Accuracy Accuracy = Ratio of correctly classified points / total points
  21. None
  22. GDP Per capita

  23. TMDB Dataset Analysis

  24. Titanic Dataset Analysis

  25. Let’s Go For It 1. Look at the dataset 2.

    Write down columns and it’s correlation 3. Make questions derived from the dataset 4. Explanatory Analysis with visualization 5. Frame problem 6. Create solution by creating model
  26. None
  27. References and links - https://developers.google.com/machine-learning/crash-course/ml-intro - https://www.kaggle.com/learn/overview - https://www.tensorflow.org/tutorials/ -

    http://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_sca ling.html - https://pandas.pydata.org/pandas-docs/stable/10min.html - https://www.superdatascience.com/machine-learning/
  28. Thank You https://krunal3kapadiya.app/ #IndiaMLCC @krunal3kapadiya if accuracy_score>0.75: