Machine learning end to end

Krunal Kapadiya @krunal3kapadiya #IndiaMLCC Engineer @

- Before we start - What is Machine Learning -
Types of Machine Learning - Main challenges - Scikit learn design - Feature scaling - Validating model - Let’s go for it Agenda

Before we start

- Numpy (Numeric Python) - List [‘Hello’, ‘GDG’, ‘Baroda’] -
Dictionaries {‘name’: ‘Krunal’} - Tuples (‘October’, ‘November’) - Pandas (Panel Datas) - Series pd.Series(np.random.randn(5)) - DataFrames pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})) Python libraries

Measures of central tendency - Mean - Average values -
Median - Middle values - Mode - Mostly repeated value - Standard Deviation - How data is spread in the dataset https://docs.google.com/spreadsheets/d/1KuG47YaIv8xQBEasGeNpNu24dyUT06ADj_0yvK4S67g/edit#gid=0

What is Machine Learning?

What is Machine Learning? Machine learning is the sub-field of
computer science that gives computers the ability to learn without being explicitly programmed - Arthur Samuel, 1959 - Wikipedia Machine learning is about the construction and study of systems that can learn from data. - Freebase

Why is Machine Learning Data used - 2005 - 130
Exabytes - 2010 - 1200 Exabytes - 2015 - 7900 Exabytes - 2020 - 40900 Exabytes 1 EB = 1000 PB = 1 million TB = 1 billion GB 1 billion GB = 10 Thousand Crore TB

Main challenges 1. Insufficient quantity of training data 2. Non
representative training data 3. Poor quality data 4. Irrelevant features 5. Overfitting 6. Underfitting

Types of Machine Learning

Based on Human supervision 1. Supervised - Labels are known
2. Unsupervised - Labels are not known 3. Semi Supervised - Some labels are available some not 4. Reinforcement Learning - Continuous self learning

Based on learning (learn on the fly) 1. Online learning
- Learn incrementally 2. Batch learning - Offline learning

Based on data patterns 1. Instance based learning - Machine
learns by its sample 2. Model based learning - Created model for it

Scikit learn design 1. Consistency a. Estimators fit() b. Transformers
fit_transform() c. Predictors predict() 2. Inspection imputer.strategy (suffix ‘_’) 3. Nonproliferation of classes Numpy and Scipy sparse matrix 4. Composition pipeline, estimators 5. Sensible defaults comes with default values

Feature scaling - Min max Scaling (a.k.a Normalization) Bounds values
- Standard Scalar Outliers not much affected To know more about scaling function visit this link

Validating model Confusion Matrix - If false negatives are ok,
requires high precision, e.g. Spam filter - If false positives are ok, requires high recall, e.g. Medical Diagnosis Precision Recall F-1 Score Accuracy Accuracy = Ratio of correctly classified points / total points

GDP Per capita

TMDB Dataset Analysis

Titanic Dataset Analysis

Let’s Go For It 1. Look at the dataset 2.
Write down columns and it’s correlation 3. Make questions derived from the dataset 4. Explanatory Analysis with visualization 5. Frame problem 6. Create solution by creating model

References and links - https://developers.google.com/machine-learning/crash-course/ml-intro - https://www.kaggle.com/learn/overview - https://www.tensorflow.org/tutorials/ -
http://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_sca ling.html - https://pandas.pydata.org/pandas-docs/stable/10min.html - https://www.superdatascience.com/machine-learning/

Thank You https://krunal3kapadiya.app/ #IndiaMLCC @krunal3kapadiya if accuracy_score>0.75:

Machine learning end to end

Machine learning end to end

Krunal Kapadiya

More Decks by Krunal Kapadiya

Other Decks in Technology

Featured

Transcript

Krunal Kapadiya @krunal3kapadiya #IndiaMLCC Engineer @

- Before we start - What is Machine Learning -

Before we start

- Numpy (Numeric Python) - List [‘Hello’, ‘GDG’, ‘Baroda’] -

Measures of central tendency - Mean - Average values -

What is Machine Learning?

What is Machine Learning? Machine learning is the sub-field of

Why is Machine Learning Data used - 2005 - 130

Main challenges 1. Insufficient quantity of training data 2. Non

Types of Machine Learning

Based on Human supervision 1. Supervised - Labels are known

Based on learning (learn on the fly) 1. Online learning

Based on data patterns 1. Instance based learning - Machine

Scikit learn design 1. Consistency a. Estimators fit() b. Transformers

Feature scaling - Min max Scaling (a.k.a Normalization) Bounds values

Validating model Confusion Matrix - If false negatives are ok,

GDP Per capita

TMDB Dataset Analysis

Titanic Dataset Analysis

Let’s Go For It 1. Look at the dataset 2.

References and links - https://developers.google.com/machine-learning/crash-course/ml-intro - https://www.kaggle.com/learn/overview - https://www.tensorflow.org/tutorials/ -

Thank You https://krunal3kapadiya.app/ #IndiaMLCC @krunal3kapadiya if accuracy_score>0.75: