Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning Libraries You'd Wish You'd Kno...

ianozsvald
October 30, 2017

Machine Learning Libraries You'd Wish You'd Known About

ianozsvald

October 30, 2017
Tweet

More Decks by ianozsvald

Other Decks in Science

Transcript

  1. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com Introductions • I’m an engineering

    data scientist • Consulting in AI + Data Science for 15+ years Blog->IanOzsvald.com
  2. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com Goals today • Is my

    regression working? • Why did it make that decision? • Can I calculate on Pandas in parallel? • Can I automate my machine learning? • Github for examples: Last year – my introduction to Random Forests as a worked process with examples and graphs
  3. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com Why explain? • Check that

    our model works as we’d expect in the real world – are the “important features” really important? Are they noise? • Help colleagues gain confidence in the model • Diagnose if certain examples are poorly understood
  4. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com Boston housing data • Regress

    median-value (MEDV) from other features • LSTAT - ‘low status %’ • RM - ‘median rooms’ • 13 features overall • 506 rows
  5. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com Yellowbrick • Lots of visualisations

    that plug into sklearn • Classification – class balance, confusion matrix • Regression – y vs ŷ, residual errors • Presented at PyDataLondon 2017 • http://www.scikit-yb.org/en/latest/
  6. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com ELI5 • “Explain it like

    I’m 5!” • Feature Importance via Permutation Importance • Prediction explanations including text • Sklearn, XGBoost, LightGBM • http://eli5.readthedocs.io/en/latest/
  7. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com ELI5 - Permutation Importance •

    Model agnostic, hopefully not skewed • Useful with both RF and linear models RandomForest's feature importances:
  8. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com Explaining the regression • ELI5

    & LIME can explain single examples • Expensive house – many rooms, low LSTAT %, good pupil/teacher ratio • Cheap house – high LSTAT %, few rooms, maybe high nitric oxide pollution and lower pupil/teacher ratio • These interpretations are different to the global feature importances • Also see Kat Jarmul’s keynote: https://blog.kjamistan.com/towards-interpretable-reliable-models / • Michał Łopuszyński @ PyDataWarsaw https://www.slideshare.net/lopusz/debugging-machinelearning
  9. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com ELI5 explanation • Model specific

    • Explain “46.8” • Expensive property • RM & LSTAT • Some PTRATIO
  10. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com LIME (circa 2016) • Locally

    linear classifiers built around the 1 data point you want to explain • Model agnostic, even images & text! • https://github.com/marcotcr/lime
  11. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com LIME • 10.99 predicted (cheap

    property) • Strong negative influences (from the mean price) – LSTAT, RM, NOX, ... Caveats: http://eli5.readthedocs.io/en/latest/blackbox/lime.html
  12. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com Dask for Medium Data Tasks

    • Pandas-compatible parallel processor • Runs on many cores and machines • Also see: Automated data exploration by Víctor Zabalza from Friday • http://ianozsvald.com/2017/06/07/kaggles -quora-question-paris-competition/
  13. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com TPOT • Used on Kaggle

    Mercedes (6 week competition, 5 days of my effort) • In top 50% result with little more than TPOT and a few days • Ensembled 3 estimators (2 from TPOT) • http://ianozsvald.com/2017/07/01/kaggl es-mercedes-benz-greener-manufacturing /
  14. [email protected] @IanOzsvald PyConUK 2017 ianozsvald.com Closing... • Diagnose your ML

    just like you debug your code – explain its working to colleagues • Write-up: http://ianozsvald.com/ • Training next year – what do you need? • Questions in exchange for beer :-) • Please send me a postcard if this is useful • See my longer diagnosis Notebook on github: