Slide 1

Slide 1 text

N O W W H AT ? ! A N “ I N T R O D U C T I O N ” T O S C I K I T- L E A R N T H I S P R O B L E M W O U L D B E S T B E A D D R E S S E D W I T H M A C H I N E L E A R N I N G … @earino #dsla

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

What is Machine Learning, and why does the learning curve feel like a Himalayan hike?

Slide 5

Slide 5 text

Machine learning is the science of getting computers to act without being explicitly programmed. Machine Learning should be A Technique of Last Resort

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

1 2 3 4

Slide 8

Slide 8 text

1 2 3 4

Slide 9

Slide 9 text

Dimensionality Reduction - Unsupervised Technique The process of reducing the number of random variables under consideration Feature selection / Feature extraction PCA - Principal Component Analysis from sklearn import decomposition ! Transform a bunch of variables into a smaller set of not correlated variables. iris example

Slide 10

Slide 10 text

1 2 3 4

Slide 11

Slide 11 text

Clustering - Unsupervised Technique So you have a bunch of data, with no labels… You ask yourself, is there any structure in this data? Congratulations, you may be asking about clustering. Clustering is a great way to “explore” data… from sklearn.cluster import KMeans handwritten digits example

Slide 12

Slide 12 text

1 2 3 4

Slide 13

Slide 13 text

Predicting a Category - Supervised Technique This is, often, what people think of when referring to Machine Learning… You have “labeled” data… examples: • fraudulent/valid • woman/man ! Doesn’t have to be binary labels… is_car/is_truck/is_bicycle You want to predict the category of new data… classifiers comparison example

Slide 14

Slide 14 text

from sklearn.neighbors import KNeighborsClassifier from sklearn.svm import SVC from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.naive_bayes import GaussianNB from sklearn.lda import LDA

Slide 15

Slide 15 text

1 2 3 4

Slide 16

Slide 16 text

Predicting a Quantity - Supervised Technique This is the thing everyone wishes they could do. If you can predict a quantity you can: •predict the stock market •predict currency fluctuations •be rich beyond your wildest dreams Cut it out. You can’t (really) do that. You can… •predict click through probability •predict housing sales prices

Slide 17

Slide 17 text

Build Pipelines Allow for all the steps of a data science / machine learning process to be “grouped together” logically. pipeline for grid search example FANTASTIC for NLP

Slide 18

Slide 18 text

CONNECT WITH US ON DATASCIENCE.LA