Machine Learning 101

Lutske de Leeuw Groningen 04-04-2024 Where to begin? Machine Learning
101

So many ideas

The original problem

Table of content • About me • Introduction to Machine
Learning • Types of Machine Learning • Understand the problem • Data Collection and Preprocessing • Model selection and training • Model evaluation • Real life examples • Conclusion

https://www.linkedin.com/in/lutske/

Introduction to Machine Learning

Machine Learning ‘Teach’ the computer Without programming

Artificial Intelligence Imitate the human brain Including machine learning

Deep Learning Neural networks Subset of machine learning

Generative AI Generate content Subset of deep learning

Artificial Intelligence Machine learning Deep learning Generative AI

Data Science Knowledge extracted from data Advice & predict

Data science Artificial Intelligence Machine learning Deep learning Generative AI
Data Science

Types of Machine Learning

Types of learning Supervised Unsupervised Semi-supervised Reinforcement

Supervised Prediction Input Label: Cat Model ? It’s a cat
Label: Not a Cat

Unsupervised Input Model

Semi-supervised Input Model It’s a cat Duck Cat ? Prediction

Reinforcement learning Agent Environment ? Walk Sit Action= Sit Reward!
Update policy Sit = good!

Types of data Numerical Categorial Text data Yes / No
Ordered series Blood groups Measured values Sentiment analysis Translations Documents

Understanding the problem

Understanding the problem • Nature of the problem • Type
of data available • Desired outcome

Example • Problem: My 2 cats needs different food •
Type of data: lots of cat pictures • Desired outcome: Recognize cat for access to the correct bowl of food

Is machine learning the way? • Big if else statement?
• Do I have enough data? • Is there an expert to check the data?

Code Example pip install pandas numpy matplotlib scikit-learn jupyter import
pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error

Data collection and preprocessing

Data collection • Identify data sources (databases, API’s, web scraping,
manual) • Determine the data requirements (type, size, format) • Collect the data • Clean the data • Validate the data • Organize the data

Don’t reinvent the wheel Kaggle UCI Google

Don’t reinvent the wheel OpenML Data.gov Data.overheid.nl

Code example: Load dataset from sklearn.datasets import fetch_california_housing california =
fetch_california_housing(as_frame=True) dataFrame = pd.DataFrame(california.data, columns=california.feature_names) dataFrame['prices'] = california.target

Code example: See the data dataFrame.head() # MedInc HouseAge AveRooms
AveBedrms Population AveOccup Latitude Longitude Prices 0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 -122.23 4.526 1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 -122.22 3.585 2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 -122.24 3.521 3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 -122.25 3.413 4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 -122.25 3.422

Preprocessing • Data cleaning • Data transformation • Feature extraction
• Feature engineering • Data splitting

Code example: Prepare data housing = dataFrame.drop('prices', axis=1) pricing =
dataFrame['prices']

Data processing

Gather data • From your resources • From existing resources
• Data Augmentation

Split the data Dataset Train Validation Test

Split the data Train 80% Test 10% Validate 10% Train
70% Test 15% Validate 15% Train 60% Test 20% Validate 20%

Code example: Split & scale the data housing_train, housing_test, pricing_train,
pricing_test = train_test_split(housing, pricing, test_size=0.2, random_state=42) scaler = StandardScaler() housing_train_scaled = scaler.fit_transform(housing_train) housing_test_scaled = scaler.transform(housing_test)

Algorithms

Linear Regression

Logistic Regression

Decision Tree Is this person fit? Age < 30 ?
Eats a lot of pizzas? Exercises in the morning? Unfit Fit Fit Unfit Yes? Yes? Yes? No? No? No?

Code example: Train the model & predict model = LinearRegression()
model.fit(housing_train_scaled, pricing_train) pricing_pred = model.predict(housing_test_scaled)

Model evaluation

Use metrics www.mathworks.com/discovery/overfitting.html

Confusion Matrix Positive (Cat) Negative (Dog) True Positive True Negative
False Negative False Positive Cat Cat Dog Not a cat Positive (Cat) Negative (Dog) Actual Predicted

Accuracy • True Positive (TP) • True Negative (TN) •
False Positive (FP) • False Negative (FN)

Formulas

Mean Squared Error

Code example: Mean Squared error mse = mean_squared_error(pricing_test, pricing_pred) print(f"Mean
Squared Error: {mse:.2f}") Mean Squared Error: 0.56

Improving the model

Improving the model • Feature Engineering • Hyperparameter tuning •
Model selection • Handling outliers • Cross Validation • Collect more data • Regularization

Code example: Handling Outliers # Example: Winsorization from scipy.stats.mstats import
winsorize dataFrame['prices'] = winsorize(dataFrame['prices'], limits=[0.05, 0.05])

Code example: Using another model # Example: Descision tree from
sklearn.tree import DecisionTreeRegressor model = DecisionTreeRegressor() model.fit(housing_train_scaled, pricing_train) pricing_pred = model.predict(housing_test_scaled) mse = mean_squared_error(pricing_test, pricing_pred) print(f"Mean Squared Error: {mse:.2f}") Mean Squared Error: 0.49

Code example: Using another model # Example: Random Forest Regression
from sklearn.ensemble import RandomForestRegressor model = RandomForestRegressor() model.fit(housing_train_scaled, pricing_train) pricing_pred = model.predict(housing_test_scaled) mse = mean_squared_error(pricing_test, pricing_pred) print(f"Mean Squared Error: {mse:.2f}") Mean Squared Error: 0.25

Do try this at home! https://tinyurl.com/485n4byj https://github.com/Lutske/ML101_where_to_begin

Real life examples

Conclusion

Key take aways • Collect a lot of data •
Don’t invent everything yourself • Machine learning isn’t always the solution • Verify the model

Questions?

Feedback

Thank you!

Machine Learning 101

Machine Learning 101

More Decks by devNetNoord

Other Decks in Technology

Featured

Transcript