Slide 1

Slide 1 text

scikit Machine Learning Scien-fic Compu-ng with Python Aus-n, Texas • July 11-17, 2016 Sebastian Raschka & Andreas Mueller! with

Slide 2

Slide 2 text

Sebastian Raschka & Andreas Mueller! Links! Contact Info:! Sebastian Raschka! Ø  [email protected]! Ø  http://sebastianraschka.com! Ø  @rasbt! Andreas Mueller! Ø  [email protected]! Ø  http://amueller.github.io! Ø  @amuellerml! Tutorial Material on GitHub:! https://github.com/amueller/scipy-2016-sklearn! 2!

Slide 3

Slide 3 text

Sebastian Raschka & Andreas Mueller! Tutorial Setup I! $ git remote add upstream https://github.com/amueller/scipy-2016-sklearn.git $ git fetch upstream $ git checkout master merge upstream/master a) Fork the Repository (if you haven’t done so, yet):! b) Sync an older fork:! 3!

Slide 4

Slide 4 text

Sebastian Raschka & Andreas Mueller! jupyter notebook check_env.ipynb Tutorial Setup II! python fetch_data.py ~456 MB!!! 4!

Slide 5

Slide 5 text

Sebastian Raschka & Andreas Mueller! Our Agenda! Ø Morning Session: 8:00 AM - 12:00 PM (Room 105)! Ø Afternoon Session: 1:30 PM - 5:30 PM (Room 105)! 5!

Slide 6

Slide 6 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 6!

Slide 7

Slide 7 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 7! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning!

Slide 8

Slide 8 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 8! s!

Slide 9

Slide 9 text

Sebastian Raschka & Andreas Mueller! What is Machine Learning?! 9! Programmer! Program! Computer! Outputs! Inputs (observations)! “Traditional” programming! Machine Learning! Outputs! Inputs (observations)! Computer! Program! Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.! -- Arthur Samuel (1959)!

Slide 10

Slide 10 text

Sebastian Raschka & Andreas Mueller! Examples of Machine Learning! 10! hJps://flic.kr/p/5BLW6G [CC BY 2.0] hJp://commons.wikimedia.org/wiki/ File:NeXlix_logo.svg [public domain] By Steve Jurvetson [CC BY 2.0] hJp://commons.wikimedia.org/wiki/ File:American_book_company_1916._leJer_envelope-2.JPG# filelinks [public domain] And many, many more …!

Slide 11

Slide 11 text

Sebastian Raschka & Andreas Mueller! 3 Types of Learning! 11! Supervised! Unsupervised! Reinforcement! Ø  Learning from labeled data! Ø  E.g., Spam classification! Ø  Discover structure in unlabeled data! Ø  E.g., Document clustering! Ø  Learning by “doing” with delayed reward! Ø  E.g., Chess computer!

Slide 12

Slide 12 text

Sebastian Raschka & Andreas Mueller! Supervised Learning! 12! Supervised! Classification! Regression! x=0.8! y=12.5! ?! ?!

Slide 13

Slide 13 text

Sebastian Raschka & Andreas Mueller! Unsupervised Learning! 13! Unsupervised! Clustering! Compression!

Slide 14

Slide 14 text

Sebastian Raschka & Andreas Mueller! Flower Classification! 14! Iris-Versicolor! Iris-Setosa! Iris-Setosa!

Slide 15

Slide 15 text

Sebastian Raschka & Andreas Mueller! Instances (samples, observations) Features (attributes, dimensions) Classes (targets) sepal_length sepal_width petal_length petal_width class 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa … … … … … … 50 6.4 3.2 4.5 1.5 veriscolor … … … … … … 150 5.9 3.0 5.1 1.8 virginica https://archive.ics.uci.edu/ml/datasets/Iris IRIS Data Representation! 15!

Slide 16

Slide 16 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! ! 8:00 AM - 12:00 PM 16! s!

Slide 17

Slide 17 text

Sebastian Raschka & Andreas Mueller! Jupyter Notebooks! 17!

Slide 18

Slide 18 text

Sebastian Raschka & Andreas Mueller! NumPy Arrays! 18! Image source: “Why Python is Slow: Looking Under the Hood” by Jake VanderPlas hJp://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/ Ø  build around a C array with pointers to a contiguous data buffer of values! Ø  Linear algebra functions! Ø  Fancy indexing! Ø  …! >>> import numpy >>> ary = numpy.array([7, 8, 9, 10, 11]) >>> ary[[2, 4]] array([ 9, 11]) >>> lst = list([7, 8, 9, 10, 11]) >>> lst[2, 4]] >>> lst[[2, 4]] Traceback (most recent call last): File "", line 1, in TypeError: list indices must be integers or slices, not list

Slide 19

Slide 19 text

Sebastian Raschka & Andreas Mueller! Scipy Sparse Matrices! 19! >>> from scipy import sparse >>> mtx = sparse.lil_matrix([[0, 1, 2, 0], ... [3, 0, 1, 0], ... [1, 0, 0, 1]]) >>> print(mtx) (0, 1) 1 (0, 2) 2 (1, 0) 3 (1, 2) 1 (2, 0) 1 (2, 3) 1 >>> print(mtx.toarray()) [[0 1 2 0] [3 0 1 0] [1 0 0 1]] List of Lists (LIL) example!

Slide 20

Slide 20 text

Sebastian Raschka & Andreas Mueller! Matplotlib! 20! >>> import matplotlib.pyplot as plt >>> import numpy as np >>> >>> mu, sigma = 200, 25 >>> x = mu + sigma*np.random.randn(10000) >>> plt.hist(x, 20, normed=1, ... histtype='stepfilled', ... facecolor='b', ... alpha=0.75) >>> plt.show()

Slide 21

Slide 21 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 21! s!

Slide 22

Slide 22 text

Sebastian Raschka & Andreas Mueller! 22! Iris-Versicolor! Iris-Setosa! Iris-Setosa! Iris!

Slide 23

Slide 23 text

Sebastian Raschka & Andreas Mueller! Digits! 23!

Slide 24

Slide 24 text

Sebastian Raschka & Andreas Mueller! Generating Synthetic Data! 24! from sklearn.datasets import make_…

Slide 25

Slide 25 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 25! A!

Slide 26

Slide 26 text

Sebastian Raschka & Andreas Mueller! Training & Test Data! 26! All Data! Training Data! Test Data! Typically:! Ø  75% : 25%! Ø  2/3 : 1/3!

Slide 27

Slide 27 text

Sebastian Raschka & Andreas Mueller! Stratification! 27! Non-stratified split:! Ø  training set → 38 x Setosa, 28 x Versicolor, 34 x Virginica! Ø  test set → 12 x Setosa, 22 x Versicolor, 16 x Virginica!

Slide 28

Slide 28 text

Sebastian Raschka & Andreas Mueller! K-Nearest Neighbors! 28! k=5! Image source: https://github.com/rasbt/python-machine-learning- book/blob/master/code/ch03/images/03_20.png!

Slide 29

Slide 29 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 29! A!

Slide 30

Slide 30 text

Sebastian Raschka & Andreas Mueller! Supervised Workflow! 30! Training Data Test Data Training Labels Model Prediction Test Labels Evaluation Training Generalization Ø  Fit model on all data after evaluation! TRAINING! GENERALIZATION!

Slide 31

Slide 31 text

Sebastian Raschka & Andreas Mueller! Supervised Workflow! 31! Training Data Test Data Training Labels Model Prediction Test Labels Evaluation Training Generalization TRAINING! GENERALIZATION! estimator.fit(X_train, y_train) estimator.predict(X_test) estimator.score(X_test, y_test)

Slide 32

Slide 32 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 32! A!

Slide 33

Slide 33 text

Sebastian Raschka & Andreas Mueller! Linear Regression! 33! y =coef_[0]*X[0] + intercept_ X[0] (feature variable) y (target variable)

Slide 34

Slide 34 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 34! S!

Slide 35

Slide 35 text

Sebastian Raschka & Andreas Mueller! Unsupervised Transformers! 35! Training Data Test Data Model New View ①  transformer.fit(X_train) ②  X_train_transf = transformer.transform(X_train) ③  X_test_transf = transformer.transform(X_test)

Slide 36

Slide 36 text

Sebastian Raschka & Andreas Mueller! Feature Scaling! 36! standardization! min-max scaling! (“normalization”)!

Slide 37

Slide 37 text

Sebastian Raschka & Andreas Mueller! Principal Component Analysis! 37! x1! x2! PC2! PC1!

Slide 38

Slide 38 text

Sebastian Raschka & Andreas Mueller! PCA for Dimensionality Reduction! 38!

Slide 39

Slide 39 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 39! S!

Slide 40

Slide 40 text

Sebastian Raschka & Andreas Mueller! K-means Clustering! 40!

Slide 41

Slide 41 text

Sebastian Raschka & Andreas Mueller! K-means Clustering! 41!

Slide 42

Slide 42 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 42! A!

Slide 43

Slide 43 text

Sebastian Raschka & Andreas Mueller! Scikit-learn API! 43!

Slide 44

Slide 44 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic) 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 44! A!

Slide 45

Slide 45 text

Sebastian Raschka & Andreas Mueller! Continuous & Categorical Features! 45! Con-nuous Categorical Nominal Ordinal e.g., sepal width in cm! [3.4, 4.7 …]! e.g., ratings! [satisfied, neutral, unsatisfied]! e.g., colors! [red, green, blue, …]!

Slide 46

Slide 46 text

Sebastian Raschka & Andreas Mueller! Case Study - Titanic Survival! 46!

Slide 47

Slide 47 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model 12 Application: SMS spam classification! 8:00 AM - 12:00 PM 47! A!

Slide 48

Slide 48 text

Sebastian Raschka & Andreas Mueller! • D1: ”Each state has its own laws.”! • D2: ”Every country has its own culture.”! ! V ={each:1, state:1, has:2, its:2, own:2, ! laws: 1, every: 1, country: 1, culture: 1} ! ! Bag of Words! 48!

Slide 49

Slide 49 text

Sebastian Raschka & Andreas Mueller! Morning Session! 01 Introduction to machine learning with sample applications! 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib! 03 Data formats, preparation, and representation! 04 Supervised learning: Training and test data! 05 Supervised learning: Estimators for classification! 06 Supervised learning: Estimators for regression analysis! 07 Unsupervised learning: Unsupervised Transformers! 08 Unsupervised learning: Clustering! 09 The scikit-learn estimator interface! 10 Preparing a real-world dataset (titanic)! 11 Working with text data via the bag-of-words model! 12 Application: SMS spam classification 8:00 AM - 12:00 PM 49! A!

Slide 50

Slide 50 text

Sebastian Raschka & Andreas Mueller! Preprocessing & Classification Overview! 50!

Slide 51

Slide 51 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 51! 13 Cross-Validation 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!

Slide 52

Slide 52 text

Sebastian Raschka & Andreas Mueller! Holdout Evaluation I! 52! Learning Algorithm Hyperparameter Values Model Prediction 2 1 Test Labels Test Data Training Data Training Labels Data Labels Training Data Training Labels Test Data

Slide 53

Slide 53 text

Sebastian Raschka & Andreas Mueller! Holdout Evaluation II! 53! Learning Algorithm Prediction Test Labels Performance Model Learning Algorithm Hyperparameter Values Final Model This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License. 3 4 Data Labels Training Labels Test Data

Slide 54

Slide 54 text

Sebastian Raschka & Andreas Mueller! Holdout Validation I! 54! 2 1 Data Labels Training Data Validation Data Validation Labels Test Data Test Labels Training Labels Performance Validation Data Prediction Learning Algorithm Hyperparameter values Model Hyperparameter values Hyperparameter values Model Model Training Data Training Labels

Slide 55

Slide 55 text

Sebastian Raschka & Andreas Mueller! Holdout Validation II! 55! Performance Model Validation Data Validation Labels Prediction Performance Model Validation Data Validation Labels Prediction Performance Model Validation Data Validation Labels Prediction Best Model Hyperparameter values Model 3 Best Hyperparameter values 4 Learning Algorithm Best Hyperparameter Values Model Training Data Training Labels Validation Data Validation Labels

Slide 56

Slide 56 text

Sebastian Raschka & Andreas Mueller! Holdout Validation III! 56! Learning Algorithm Best Hyperparameter Values Final Model This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License. 6 Data Labels Prediction Test Labels Performance Model Test Data 5

Slide 57

Slide 57 text

Sebastian Raschka & Andreas Mueller! K-fold Cross-Validation! 57! 1st 2nd 3rd 4th 5th K Iterations (K-Folds) Validation Fold Training Fold Learning Algorithm Hyperparameter Values Model Training Fold Data Training Fold Labels Prediction Performance Model Validation Fold Data Validation Fold Labels Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 10 ∑ 10 i=1 Performancei = This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License.

Slide 58

Slide 58 text

Sebastian Raschka & Andreas Mueller! 58! K-fold Cross-Validation Pipeline I! Test Labels Test Data Training Data Training Labels Data Labels Model Model Model Learning Algorithm Hyperparameter values Hyperparameter values Hyperparameter values Training Data Training Labels Best Hyperparameter Values Model Training Data 2 1 3

Slide 59

Slide 59 text

Sebastian Raschka & Andreas Mueller! 59! K-fold Cross-Validation Pipeline II! This work by Sebastian Raschka is licensed under a Model Hyperparameter values Learning Algorithm Best Hyperparameter Values Model Training Data Training Labels Prediction Test Labels Performance Model Test Data Learning Algorithm Best Hyperparameter Values Final Model Data Labels 3 4 5

Slide 60

Slide 60 text

Sebastian Raschka & Andreas Mueller! Nested CV! 60! 1st 2nd 3rd 4th 5th Outer Loop Outer Validation Fold Outer Training Fold Performance Performance Performance Performance Performance 1 2 3 4 5 Performance 1 10 ∑ 10 i=1 Performancei = This work by Sebastian Raschka is licensed under a Creative Commons Attribution 4.0 International License. Inner Loop Inner Training Fold Inner Validation Fold Performance Performance Best Hyperparameter Values Best Learning Algorithm 5,1 5, 2 Performance 1 2 ∑ 2 j=1 Performance5,j

Slide 61

Slide 61 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 61! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!

Slide 62

Slide 62 text

Sebastian Raschka & Andreas Mueller! Learning Curves! 62! Image source: hJps://github.com/rasbt/python-machine- learning-book/blob/master/code/ch06/images/06_04.png

Slide 63

Slide 63 text

Sebastian Raschka & Andreas Mueller! Model Complexity! 63!

Slide 64

Slide 64 text

Sebastian Raschka & Andreas Mueller! Grid Search! 64! gamma parameter! C parameter! Source: hJp://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html

Slide 65

Slide 65 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 65! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! A!

Slide 66

Slide 66 text

Sebastian Raschka & Andreas Mueller! Pipelines! 66! pipe.fit(X, y) T1 X y T1.fit(X, y) T2.fit(X1, y) Classifier.fit(X2, y) T1.transform(X) pipe.predict(X') X' y' Classifier.predict(X'2) T2 Classifier T2 T1 X1 y T2.transform(X1) X2 y Classifier T1.transform(X') X'1 T2.transform(X'1) X'2 pipe = make_pipeline(T1(), T2(), Classifier())

Slide 67

Slide 67 text

Sebastian Raschka & Andreas Mueller! Pipelines & Cross Validation! 67! Training Data Training Labels Model Feature Extraction Scaling Feature Selection Cross Validation

Slide 68

Slide 68 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 68! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! A!

Slide 69

Slide 69 text

Sebastian Raschka & Andreas Mueller! Confusion Matrix! 69!

Slide 70

Slide 70 text

Sebastian Raschka & Andreas Mueller! Classification Metrics I! 70!

Slide 71

Slide 71 text

Sebastian Raschka & Andreas Mueller! Classification Metrics II! 71!

Slide 72

Slide 72 text

Sebastian Raschka & Andreas Mueller! Classification Metrics III! 72!

Slide 73

Slide 73 text

Sebastian Raschka & Andreas Mueller! Receiver Operator Characteristic! 73! Image source: hJp://scikit-learn.org/stable/_images/plot_roc_001.png

Slide 74

Slide 74 text

Sebastian Raschka & Andreas Mueller! Multi-Class! 74!

Slide 75

Slide 75 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 75! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! A!

Slide 76

Slide 76 text

Sebastian Raschka & Andreas Mueller! y_pred = x_test[0] * coef_[0] + ... + x_test[n_features-1] * coef_[n_features-1] + intercept_ Linear models for regression! 76! x[0] y slope (=Δy/Δx)!

Slide 77

Slide 77 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 77! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! A!

Slide 78

Slide 78 text

Sebastian Raschka & Andreas Mueller! Support Vector Machines! 78! Image source: hJps://github.com/rasbt/python-machine-learning- book/blob/master/code/ch03/images/03_07.png

Slide 79

Slide 79 text

Sebastian Raschka & Andreas Mueller! Kernel Trick! 79! hJps://github.com/rasbt/python-machine-learning-book/blob/ master/code/ch03/images/03_11.png

Slide 80

Slide 80 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 80! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!

Slide 81

Slide 81 text

Sebastian Raschka & Andreas Mueller! Decision Trees! 81!

Slide 82

Slide 82 text

Sebastian Raschka & Andreas Mueller! Classification w. Continuous Features! 82!

Slide 83

Slide 83 text

Sebastian Raschka & Andreas Mueller! Impurity measures! 83!

Slide 84

Slide 84 text

Sebastian Raschka & Andreas Mueller! Ensemble Methods! 84! Bagging Random Forests Boos-ng

Slide 85

Slide 85 text

Sebastian Raschka & Andreas Mueller! Bagging! 85! Image source: hJps://github.com/rasbt/python-machine-learning-book/blob/master/code/ch07/images/07_07.png

Slide 86

Slide 86 text

Sebastian Raschka & Andreas Mueller! Bagging Example! 86! Image source: hJps://github.com/rasbt/python-machine-learning-book/blob/master/code/ch07/images/07_08.png

Slide 87

Slide 87 text

Sebastian Raschka & Andreas Mueller! Boosting! 87! Image source: hJps://github.com/rasbt/python-machine-learning-book/blob/master/code/ch07/images/07_09.png

Slide 88

Slide 88 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 88! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!

Slide 89

Slide 89 text

Sebastian Raschka & Andreas Mueller! Dimensionality Reduction! 89! Dimensionality Reduc-on Feature Selec-on Feature Extrac-on

Slide 90

Slide 90 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 90! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning! S!

Slide 91

Slide 91 text

Sebastian Raschka & Andreas Mueller! Hierarchical Clustering! 91! hJps://en.wikipedia.org/wiki/Hierarchical_clustering#/media/ File:Hierarchical_clustering_simple_diagram.svg [CC BY-SA 3.0]

Slide 92

Slide 92 text

Sebastian Raschka & Andreas Mueller! DBSCAN! 92!

Slide 93

Slide 93 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 93! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction 23 Supervised learning: Out-of-core learning! S!

Slide 94

Slide 94 text

Sebastian Raschka & Andreas Mueller! PCA! 94! 2D! 1D! ?!

Slide 95

Slide 95 text

Sebastian Raschka & Andreas Mueller! PCA! 95! 2D! 1D!

Slide 96

Slide 96 text

Sebastian Raschka & Andreas Mueller! Kernel PCA! 96! 2D! 1D!

Slide 97

Slide 97 text

Sebastian Raschka & Andreas Mueller! Afternoon Session! 1:30 PM - 5:30 PM 97! 13 Cross-Validation! 14 Model complexity and grid search for adjusting hyperparameters! 15 Scikit-learn Pipelines! 16 Supervised learning: Performance metrics for classification! 17 Supervised learning: Linear Models! 18 Supervised learning: Support Vector Machines! 19 Supervised learning: Decision trees and random forests, ensemble methods! 20 Supervised learning: Feature selection! 21 Unsupervised learning: Hierarchical and density-based clustering algorithms! 22 Unsupervised learning: Non-linear dimensionality reduction! 23 Supervised learning: Out-of-core learning S!