Machine Learning Libraries You'd Wish You'd Known About

Machine learning libraries you'd wish you'd known about London Python
2018-01 Ian Ozsvald @IanOzsvald ModelInsight.io

[email protected] @IanOzsvald[.com] London Python 2018-01 Introductions • I’m an engineering
data scientist • Consulting in AI + Data Science for 15+ years Blog->IanOzsvald.com

[email protected] @IanOzsvald[.com] London Python 2018-01 Goals today • Can I
calculate on Pandas in parallel? • Can I automate my machine learning? • Is my regression working? • Why did it make that decision? • Github for examples: Builds on PyConUK 2016 – my introduction to Random Forests as a worked process with examples and graphs

[email protected] @IanOzsvald[.com] London Python 2018-01 watermark for reproducibility

[email protected] @IanOzsvald[.com] London Python 2018-01 Dask for Medium Data Tasks
• Pandas-compatible parallel processor • Also see: Automated data exploration by Víctor Zabalza at PyConUK 2017 • http://ianozsvald.com/2017/06/07/kaggles -quora-question-paris-competition/ • In top 40% in < 6 days of effort

[email protected] @IanOzsvald[.com] London Python 2018-01 Dask – 3.5* speedup

[email protected] @IanOzsvald[.com] London Python 2018-01 TPOT – automated ML

[email protected] @IanOzsvald[.com] London Python 2018-01 TPOT • Used on Kaggle
Mercedes (6 week competition, 5 days of my effort) • In top 40% result with little more than TPOT and a few days • Ensembled 3 estimators (2 from TPOT) • http://ianozsvald.com/2017/07/01/kaggl es-mercedes-benz-greener-manufacturing

[email protected] @IanOzsvald[.com] London Python 2018-01 pandas_profiling for EDA

[email protected] @IanOzsvald[.com] London Python 2018-01 Why explain our models? •
Check that our model works as we’d expect in the real world – are the “important features” really important? Are they noise? • Help colleagues gain confidence in the model • Diagnose if certain examples are poorly understood

[email protected] @IanOzsvald[.com] London Python 2018-01 Boston housing data • Regress
median-value (MEDV) from other features • LSTAT - ‘low status %’ • RM - ‘median rooms’ • 13 features overall • 506 rows

[email protected] @IanOzsvald[.com] London Python 2018-01 Yellowbrick • Lots of visualisations
that plug into sklearn • Classification – class balance, confusion matrix • Regression – y vs ŷ, residual errors • Presented at PyDataLondon 2017 • http://www.scikit-yb.org/en/latest/

[email protected] @IanOzsvald[.com] London Python 2018-01 Yellowbrick

[email protected] @IanOzsvald[.com] London Python 2018-01 ELI5 • “Explain it like
I’m 5!” • Feature Importance via Permutation Importance • Prediction explanations including text • Sklearn, XGBoost, LightGBM • http://eli5.readthedocs.io/en/latest/

[email protected] @IanOzsvald[.com] London Python 2018-01 ELI5 - Permutation Importance •
Model agnostic, hopefully not skewed • Useful with both RF and linear models RandomForest's feature importances:

[email protected] @IanOzsvald[.com] London Python 2018-01 Explaining the regression • ELI5
& LIME can explain single examples • Expensive house – many rooms, low LSTAT %, good pupil/teacher ratio • Cheap house – high LSTAT %, few rooms, maybe high nitric oxide pollution and lower pupil/teacher ratio • These interpretations are different to the global feature importances • Also see Kat Jarmul’s keynote @ PyDataWarsaw 2017: https://blog.kjamistan.com/towards-interpretable-reliable-models • Michał Łopuszyński @ PyDataWarsaw https://www.slideshare.net/lopusz/debugging-machinelearning

[email protected] @IanOzsvald[.com] London Python 2018-01 ELI5 explanation • Model specific
• Explain “46.8” • Expensive property • RM & LSTAT • Some PTRATIO

[email protected] @IanOzsvald[.com] London Python 2018-01 ELI5 explain many examples

[email protected] @IanOzsvald[.com] London Python 2018-01 ELI5 explain many examples Few
rooms, close to employment centres, lower LSTAT% Many rooms (big houses!)

[email protected] @IanOzsvald[.com] London Python 2018-01 LIME (circa 2016) • Locally
linear classifiers built around the 1 data point you want to explain • Model agnostic, even images & text! • https://github.com/marcotcr/lime

[email protected] @IanOzsvald[.com] London Python 2018-01 LIME • 10.99 predicted (cheap
property) • Strong negative influences (from the mean price) – LSTAT, RM, NOX, ... Caveats: http://eli5.readthedocs.io/en/latest/blackbox/lime.html

[email protected] @IanOzsvald[.com] London Python 2018-01 Closing... • Diagnose your ML
just like you debug your code – explain its working to colleagues • Write-up: http://ianozsvald.com/ • Data science team coaching – can I help? • Questions in exchange for beer :-) • Learn something? Please send me a postcard! • See my longer diagnosis Notebook on github:

[email protected] @IanOzsvald[.com] London Python 2018-01 Appendix: Dask – 3.5* speedup
https://twitter.com/ianozsvald/status/870643737097056259

Machine Learning Libraries You'd Wish You'd Kno...

Machine Learning Libraries You'd Wish You'd Known About

ianozsvald

More Decks by ianozsvald

Other Decks in Science

Featured

Transcript