Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Eduardo Arino de la Rubia - scikit-learn - PyDS...

Data Science LA
November 07, 2014
2.9k

Eduardo Arino de la Rubia - scikit-learn - PyDSLA meetup - Nov 2014

Data Science LA

November 07, 2014
Tweet

More Decks by Data Science LA

Transcript

  1. N O W W H AT ? ! A N

    “ I N T R O D U C T I O N ” T O S C I K I T- L E A R N T H I S P R O B L E M W O U L D B E S T B E A D D R E S S E D W I T H M A C H I N E L E A R N I N G … @earino #dsla
  2. Machine learning is the science of getting computers to act

    without being explicitly programmed. Machine Learning should be A Technique of Last Resort
  3. Dimensionality Reduction - Unsupervised Technique The process of reducing the

    number of random variables under consideration Feature selection / Feature extraction PCA - Principal Component Analysis from sklearn import decomposition ! Transform a bunch of variables into a smaller set of not correlated variables. iris example
  4. Clustering - Unsupervised Technique So you have a bunch of

    data, with no labels… You ask yourself, is there any structure in this data? Congratulations, you may be asking about clustering. Clustering is a great way to “explore” data… from sklearn.cluster import KMeans handwritten digits example
  5. Predicting a Category - Supervised Technique This is, often, what

    people think of when referring to Machine Learning… You have “labeled” data… examples: • fraudulent/valid • woman/man ! Doesn’t have to be binary labels… is_car/is_truck/is_bicycle You want to predict the category of new data… classifiers comparison example
  6. from sklearn.neighbors import KNeighborsClassifier from sklearn.svm import SVC from sklearn.tree

    import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.naive_bayes import GaussianNB from sklearn.lda import LDA
  7. Predicting a Quantity - Supervised Technique This is the thing

    everyone wishes they could do. If you can predict a quantity you can: •predict the stock market •predict currency fluctuations •be rich beyond your wildest dreams Cut it out. You can’t (really) do that. You can… •predict click through probability •predict housing sales prices
  8. Build Pipelines Allow for all the steps of a data

    science / machine learning process to be “grouped together” logically. pipeline for grid search example FANTASTIC for NLP