Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning 101

Machine Learning 101

Presentatie gegeven tijdens devCampNoord '24 in Kinepolis Groningen.

devNetNoord

April 04, 2024
Tweet

More Decks by devNetNoord

Other Decks in Technology

Transcript

  1. Table of content • About me • Introduction to Machine

    Learning • Types of Machine Learning • Understand the problem • Data Collection and Preprocessing • Model selection and training • Model evaluation • Real life examples • Conclusion
  2. Types of data Numerical Categorial Text data Yes / No

    Ordered series Blood groups Measured values Sentiment analysis Translations Documents
  3. Understanding the problem • Nature of the problem • Type

    of data available • Desired outcome
  4. Example • Problem: My 2 cats needs different food •

    Type of data: lots of cat pictures • Desired outcome: Recognize cat for access to the correct bowl of food
  5. Is machine learning the way? • Big if else statement?

    • Do I have enough data? • Is there an expert to check the data?
  6. Code Example pip install pandas numpy matplotlib scikit-learn jupyter import

    pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error
  7. Data collection • Identify data sources (databases, API’s, web scraping,

    manual) • Determine the data requirements (type, size, format) • Collect the data • Clean the data • Validate the data • Organize the data
  8. Code example: Load dataset from sklearn.datasets import fetch_california_housing california =

    fetch_california_housing(as_frame=True) dataFrame = pd.DataFrame(california.data, columns=california.feature_names) dataFrame['prices'] = california.target
  9. Code example: See the data dataFrame.head() # MedInc HouseAge AveRooms

    AveBedrms Population AveOccup Latitude Longitude Prices 0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 -122.23 4.526 1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 -122.22 3.585 2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 -122.24 3.521 3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 -122.25 3.413 4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 -122.25 3.422
  10. Split the data Train 80% Test 10% Validate 10% Train

    70% Test 15% Validate 15% Train 60% Test 20% Validate 20%
  11. Code example: Split & scale the data housing_train, housing_test, pricing_train,

    pricing_test = train_test_split(housing, pricing, test_size=0.2, random_state=42) scaler = StandardScaler() housing_train_scaled = scaler.fit_transform(housing_train) housing_test_scaled = scaler.transform(housing_test)
  12. Decision Tree Is this person fit? Age < 30 ?

    Eats a lot of pizzas? Exercises in the morning? Unfit Fit Fit Unfit Yes? Yes? Yes? No? No? No?
  13. Code example: Train the model & predict model = LinearRegression()

    model.fit(housing_train_scaled, pricing_train) pricing_pred = model.predict(housing_test_scaled)
  14. Confusion Matrix Positive (Cat) Negative (Dog) True Positive True Negative

    False Negative False Positive Cat Cat Dog Not a cat Positive (Cat) Negative (Dog) Actual Predicted
  15. Accuracy • True Positive (TP) • True Negative (TN) •

    False Positive (FP) • False Negative (FN)
  16. Improving the model • Feature Engineering • Hyperparameter tuning •

    Model selection • Handling outliers • Cross Validation • Collect more data • Regularization
  17. Code example: Handling Outliers # Example: Winsorization from scipy.stats.mstats import

    winsorize dataFrame['prices'] = winsorize(dataFrame['prices'], limits=[0.05, 0.05])
  18. Code example: Using another model # Example: Descision tree from

    sklearn.tree import DecisionTreeRegressor model = DecisionTreeRegressor() model.fit(housing_train_scaled, pricing_train) pricing_pred = model.predict(housing_test_scaled) mse = mean_squared_error(pricing_test, pricing_pred) print(f"Mean Squared Error: {mse:.2f}") Mean Squared Error: 0.49
  19. Code example: Using another model # Example: Random Forest Regression

    from sklearn.ensemble import RandomForestRegressor model = RandomForestRegressor() model.fit(housing_train_scaled, pricing_train) pricing_pred = model.predict(housing_test_scaled) mse = mean_squared_error(pricing_test, pricing_pred) print(f"Mean Squared Error: {mse:.2f}") Mean Squared Error: 0.25
  20. Key take aways • Collect a lot of data •

    Don’t invent everything yourself • Machine learning isn’t always the solution • Verify the model