Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Eduardo Arino de la Rubia - scikit-learn - PyDSLA meetup - Nov 2014

E936a58f495e26123f9f537ea31968f7?s=47 Data Science LA
November 07, 2014

Eduardo Arino de la Rubia - scikit-learn - PyDSLA meetup - Nov 2014


Data Science LA

November 07, 2014

More Decks by Data Science LA


  1. N O W W H AT ? ! A N

    “ I N T R O D U C T I O N ” T O S C I K I T- L E A R N T H I S P R O B L E M W O U L D B E S T B E A D D R E S S E D W I T H M A C H I N E L E A R N I N G … @earino #dsla
  2. None
  3. None
  4. What is Machine Learning, and why does the learning curve

    feel like a Himalayan hike?
  5. Machine learning is the science of getting computers to act

    without being explicitly programmed. Machine Learning should be A Technique of Last Resort
  6. None
  7. 1 2 3 4

  8. 1 2 3 4

  9. Dimensionality Reduction - Unsupervised Technique The process of reducing the

    number of random variables under consideration Feature selection / Feature extraction PCA - Principal Component Analysis from sklearn import decomposition ! Transform a bunch of variables into a smaller set of not correlated variables. iris example
  10. 1 2 3 4

  11. Clustering - Unsupervised Technique So you have a bunch of

    data, with no labels… You ask yourself, is there any structure in this data? Congratulations, you may be asking about clustering. Clustering is a great way to “explore” data… from sklearn.cluster import KMeans handwritten digits example
  12. 1 2 3 4

  13. Predicting a Category - Supervised Technique This is, often, what

    people think of when referring to Machine Learning… You have “labeled” data… examples: • fraudulent/valid • woman/man ! Doesn’t have to be binary labels… is_car/is_truck/is_bicycle You want to predict the category of new data… classifiers comparison example
  14. from sklearn.neighbors import KNeighborsClassifier from sklearn.svm import SVC from sklearn.tree

    import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.naive_bayes import GaussianNB from sklearn.lda import LDA
  15. 1 2 3 4

  16. Predicting a Quantity - Supervised Technique This is the thing

    everyone wishes they could do. If you can predict a quantity you can: •predict the stock market •predict currency fluctuations •be rich beyond your wildest dreams Cut it out. You can’t (really) do that. You can… •predict click through probability •predict housing sales prices
  17. Build Pipelines Allow for all the steps of a data

    science / machine learning process to be “grouped together” logically. pipeline for grid search example FANTASTIC for NLP