Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning Libraries You'd Wish You'd Kno...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for ianozsvald ianozsvald
January 24, 2018

Machine Learning Libraries You'd Wish You'd Known About

Given to London Python in 2018-01.

Avatar for ianozsvald

ianozsvald

January 24, 2018
Tweet

More Decks by ianozsvald

Other Decks in Science

Transcript

  1. Machine learning libraries you'd wish you'd known about London Python

    2018-01 Ian Ozsvald @IanOzsvald ModelInsight.io
  2. [email protected] @IanOzsvald[.com] London Python 2018-01 Introductions • I’m an engineering

    data scientist • Consulting in AI + Data Science for 15+ years Blog->IanOzsvald.com
  3. [email protected] @IanOzsvald[.com] London Python 2018-01 Goals today • Can I

    calculate on Pandas in parallel? • Can I automate my machine learning? • Is my regression working? • Why did it make that decision? • Github for examples: Builds on PyConUK 2016 – my introduction to Random Forests as a worked process with examples and graphs
  4. [email protected] @IanOzsvald[.com] London Python 2018-01 Dask for Medium Data Tasks

    • Pandas-compatible parallel processor • Also see: Automated data exploration by Víctor Zabalza at PyConUK 2017 • http://ianozsvald.com/2017/06/07/kaggles -quora-question-paris-competition/ • In top 40% in < 6 days of effort
  5. [email protected] @IanOzsvald[.com] London Python 2018-01 TPOT • Used on Kaggle

    Mercedes (6 week competition, 5 days of my effort) • In top 40% result with little more than TPOT and a few days • Ensembled 3 estimators (2 from TPOT) • http://ianozsvald.com/2017/07/01/kaggl es-mercedes-benz-greener-manufacturing
  6. [email protected] @IanOzsvald[.com] London Python 2018-01 Why explain our models? •

    Check that our model works as we’d expect in the real world – are the “important features” really important? Are they noise? • Help colleagues gain confidence in the model • Diagnose if certain examples are poorly understood
  7. [email protected] @IanOzsvald[.com] London Python 2018-01 Boston housing data • Regress

    median-value (MEDV) from other features • LSTAT - ‘low status %’ • RM - ‘median rooms’ • 13 features overall • 506 rows
  8. [email protected] @IanOzsvald[.com] London Python 2018-01 Yellowbrick • Lots of visualisations

    that plug into sklearn • Classification – class balance, confusion matrix • Regression – y vs ŷ, residual errors • Presented at PyDataLondon 2017 • http://www.scikit-yb.org/en/latest/
  9. [email protected] @IanOzsvald[.com] London Python 2018-01 ELI5 • “Explain it like

    I’m 5!” • Feature Importance via Permutation Importance • Prediction explanations including text • Sklearn, XGBoost, LightGBM • http://eli5.readthedocs.io/en/latest/
  10. [email protected] @IanOzsvald[.com] London Python 2018-01 ELI5 - Permutation Importance •

    Model agnostic, hopefully not skewed • Useful with both RF and linear models RandomForest's feature importances:
  11. [email protected] @IanOzsvald[.com] London Python 2018-01 Explaining the regression • ELI5

    & LIME can explain single examples • Expensive house – many rooms, low LSTAT %, good pupil/teacher ratio • Cheap house – high LSTAT %, few rooms, maybe high nitric oxide pollution and lower pupil/teacher ratio • These interpretations are different to the global feature importances • Also see Kat Jarmul’s keynote @ PyDataWarsaw 2017: https://blog.kjamistan.com/towards-interpretable-reliable-models • Michał Łopuszyński @ PyDataWarsaw https://www.slideshare.net/lopusz/debugging-machinelearning
  12. [email protected] @IanOzsvald[.com] London Python 2018-01 ELI5 explanation • Model specific

    • Explain “46.8” • Expensive property • RM & LSTAT • Some PTRATIO
  13. [email protected] @IanOzsvald[.com] London Python 2018-01 ELI5 explain many examples Few

    rooms, close to employment centres, lower LSTAT% Many rooms (big houses!)
  14. [email protected] @IanOzsvald[.com] London Python 2018-01 LIME (circa 2016) • Locally

    linear classifiers built around the 1 data point you want to explain • Model agnostic, even images & text! • https://github.com/marcotcr/lime
  15. [email protected] @IanOzsvald[.com] London Python 2018-01 LIME • 10.99 predicted (cheap

    property) • Strong negative influences (from the mean price) – LSTAT, RM, NOX, ... Caveats: http://eli5.readthedocs.io/en/latest/blackbox/lime.html
  16. [email protected] @IanOzsvald[.com] London Python 2018-01 Closing... • Diagnose your ML

    just like you debug your code – explain its working to colleagues • Write-up: http://ianozsvald.com/ • Data science team coaching – can I help? • Questions in exchange for beer :-) • Learn something? Please send me a postcard! • See my longer diagnosis Notebook on github: