Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine learning end to end

Machine learning end to end

Machine learning end to end presentation at GDGBaroda.
Agenda-> Introduction, central tendency, precision recall, validations.

Krunal Kapadiya

October 14, 2018
Tweet

More Decks by Krunal Kapadiya

Other Decks in Technology

Transcript

  1. - Before we start - What is Machine Learning -

    Types of Machine Learning - Main challenges - Scikit learn design - Feature scaling - Validating model - Let’s go for it Agenda
  2. - Numpy (Numeric Python) - List [‘Hello’, ‘GDG’, ‘Baroda’] -

    Dictionaries {‘name’: ‘Krunal’} - Tuples (‘October’, ‘November’) - Pandas (Panel Datas) - Series pd.Series(np.random.randn(5)) - DataFrames pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})) Python libraries
  3. Measures of central tendency - Mean - Average values -

    Median - Middle values - Mode - Mostly repeated value - Standard Deviation - How data is spread in the dataset https://docs.google.com/spreadsheets/d/1KuG47YaIv8xQBEasGeNpNu24dyUT06ADj_0yvK4S67g/edit#gid=0
  4. What is Machine Learning? Machine learning is the sub-field of

    computer science that gives computers the ability to learn without being explicitly programmed - Arthur Samuel, 1959 - Wikipedia Machine learning is about the construction and study of systems that can learn from data. - Freebase
  5. Why is Machine Learning Data used - 2005 - 130

    Exabytes - 2010 - 1200 Exabytes - 2015 - 7900 Exabytes - 2020 - 40900 Exabytes 1 EB = 1000 PB = 1 million TB = 1 billion GB 1 billion GB = 10 Thousand Crore TB
  6. Main challenges 1. Insufficient quantity of training data 2. Non

    representative training data 3. Poor quality data 4. Irrelevant features 5. Overfitting 6. Underfitting
  7. Based on Human supervision 1. Supervised - Labels are known

    2. Unsupervised - Labels are not known 3. Semi Supervised - Some labels are available some not 4. Reinforcement Learning - Continuous self learning
  8. Based on learning (learn on the fly) 1. Online learning

    - Learn incrementally 2. Batch learning - Offline learning
  9. Based on data patterns 1. Instance based learning - Machine

    learns by its sample 2. Model based learning - Created model for it
  10. Scikit learn design 1. Consistency a. Estimators fit() b. Transformers

    fit_transform() c. Predictors predict() 2. Inspection imputer.strategy (suffix ‘_’) 3. Nonproliferation of classes Numpy and Scipy sparse matrix 4. Composition pipeline, estimators 5. Sensible defaults comes with default values
  11. Feature scaling - Min max Scaling (a.k.a Normalization) Bounds values

    - Standard Scalar Outliers not much affected To know more about scaling function visit this link
  12. Validating model Confusion Matrix - If false negatives are ok,

    requires high precision, e.g. Spam filter - If false positives are ok, requires high recall, e.g. Medical Diagnosis Precision Recall F-1 Score Accuracy Accuracy = Ratio of correctly classified points / total points
  13. Let’s Go For It 1. Look at the dataset 2.

    Write down columns and it’s correlation 3. Make questions derived from the dataset 4. Explanatory Analysis with visualization 5. Frame problem 6. Create solution by creating model
  14. References and links - https://developers.google.com/machine-learning/crash-course/ml-intro - https://www.kaggle.com/learn/overview - https://www.tensorflow.org/tutorials/ -

    http://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_sca ling.html - https://pandas.pydata.org/pandas-docs/stable/10min.html - https://www.superdatascience.com/machine-learning/