Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Creating correct and capable classifiers" at PyDataLondon 2018

ianozsvald
April 28, 2018

"Creating correct and capable classifiers" at PyDataLondon 2018

Iteratively building a classifier requires a mix of skill, diagnostic ability and guesswork. I'll lay out a framework that helps you build reliable classifiers with greater confidence and less random guesswork. Tools demonstrated will include sklearn, YellowBrick, ELI5, pandas_profiling and skopt.

Blog: http://ianozsvald.com/2018/04/30/pydatalondon-2018-and-creating-correct-and-capable-classifiers/

ianozsvald

April 28, 2018
Tweet

More Decks by ianozsvald

Other Decks in Technology

Transcript

  1. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Introductions • I’m an engineering data

    scientist • Consulting in AI + Data Science for 15+ years Blog->IanOzsvald.com
  2. [email protected] @IanOzsvald[.com] PyDataLondon 2018 NumFOCUS • Have you thanked a

    speaker, a volunteer and a NumFOCUS organiser yet? Lots of volunteered time – please say thanks • Leah can’t make it due to illness – please Tweet “@numfocus Leah get well soon from London!” • Book signing (High Performance Python) at lunch
  3. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Goals today • Get a baseline

    model • Visualise errors & diagnose problem areas • Explain decisions • Github for examples:
  4. [email protected] @IanOzsvald[.com] PyDataLondon 2018 TSNE by features Features for this

    cluster - lots of imputed ages! We’ve filtered by x, y region on the TSNE plot
  5. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Last mentions • skopt’s BayesSearchCV perhaps

    “beats” RandomizedSearchCV & GridSearchCV • New iteration of this talk for PyDataAmsterdam 2018 in 1 month (with SHAP replacing ELI5 + other tools) • If you can’t reliably explain why a prediction happens – do you really understand your model?
  6. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Closing... • Diagnose your ML just

    like you debug your code – explain its working to colleagues • Do you want training on topics like this? • Write-up + more: http://ianozsvald.com/ • Questions in exchange for beer :-) • Learnt something? Please send me a postcard! • See my longer diagnosis Notebook on github:
  7. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Appendix • Ian’s “Machine Learning Libraries

    You’d Wish You’d Knew” @ PyConUK 2017 • Ian’s “Using Machine Learning to solve a classification problem with scikit-learn” @ PyConUK 2016 • Gael Varoquaux’s tutorial “Understanding and diagnosing your machine-learning models” @ PyDataLondon 2018 http://gael-varoquaux.info/interpreting_ml_tuto/ • Also see Kat Jarmul’s keynote @ PyDataWarsaw 2017: https://blog.kjamistan.com/towards-interpretable-reliable-model s • Michał Łopuszyński @ PyDataWarsaw https://www.slideshare.net/lopusz/debugging-machinelearning