Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Short Introduction to scikit-learn

A Short Introduction to scikit-learn

Short overview of the scikit-learn project for the Data Tuesday meetup of Feb 26 2013 in Paris.

Olivier Grisel

February 24, 2013
Tweet

More Decks by Olivier Grisel

Other Decks in Technology

Transcript

  1. Possible Applications • Text Classification / Sequence Tagging NLP •

    Spam Filtering, Sentiment Analysis... • Computer Vision / Speech Recognition • Learning To Rank - IR and advertisement • Science: Statistical Analysis of the Brain, Astronomy, Biology, Social Sciences... lundi 25 février 13
  2. • Library for Machine Learning • Open Source (BSD) •

    Simple fit / predict / transform API • Python / NumPy / SciPy / Cython • Model Assessment, Selection & Ensembles lundi 25 février 13
  3. Total dataset size: n_samples: 1288, n_features: 1850, n_classes: 7 Extracting

    the top 150 eigenfaces from 966 faces done in 0.466s Projecting the input data on the eigenfaces orthonormal basis done in 0.056s Fitting the SVM classifier to the training set done in 18.549s Predicting people's names on the test set done in 0.062s precision recall f1-score support Ariel Sharon 0.90 0.75 0.82 12 Colin Powell 0.78 0.94 0.85 62 Donald Rumsfeld 0.86 0.72 0.78 25 George W Bush 0.89 0.96 0.92 141 Gerhard Schroeder 0.92 0.74 0.82 31 Hugo Chavez 0.90 0.53 0.67 17 Tony Blair 0.81 0.74 0.77 34 avg / total 0.86 0.86 0.86 322 lundi 25 février 13
  4. scikit-learn contributors • GitHub-centric contribution workflow • each pull request

    needs 2 x [+1] reviews • code + tests + doc + example • 92% test coverage / Continuous Integr. • 4 major releases per years + 4 bugfix rel. • 66 contributors for release 0.13 lundi 25 février 13
  5. scikit-learn users • We support users on & ML •

    200+ questions tagged with [scikit-learn] • Many competitors + benchmarks • 500+ answers on ongoing user survey • 60% academics / 40% from industry • Some data-driven Startups use sklearn lundi 25 février 13
  6. Caveat Emptor • Domain specific tooling kept to a minimum

    • Some feature extraction for Bag of Words Text Analysis • Some functions for extracting image patches • Domain integration is the responsibility of the user or 3rd party libraries lundi 25 février 13