Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Creating correct and capable classifiers" at P...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for ianozsvald ianozsvald
April 28, 2018

"Creating correct and capable classifiers" at PyDataLondon 2018

Iteratively building a classifier requires a mix of skill, diagnostic ability and guesswork. I'll lay out a framework that helps you build reliable classifiers with greater confidence and less random guesswork. Tools demonstrated will include sklearn, YellowBrick, ELI5, pandas_profiling and skopt.

Blog: http://ianozsvald.com/2018/04/30/pydatalondon-2018-and-creating-correct-and-capable-classifiers/

Avatar for ianozsvald

ianozsvald

April 28, 2018
Tweet

More Decks by ianozsvald

Other Decks in Technology

Transcript

  1. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Introductions • I’m an engineering data

    scientist • Consulting in AI + Data Science for 15+ years Blog->IanOzsvald.com
  2. [email protected] @IanOzsvald[.com] PyDataLondon 2018 NumFOCUS • Have you thanked a

    speaker, a volunteer and a NumFOCUS organiser yet? Lots of volunteered time – please say thanks • Leah can’t make it due to illness – please Tweet “@numfocus Leah get well soon from London!” • Book signing (High Performance Python) at lunch
  3. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Goals today • Get a baseline

    model • Visualise errors & diagnose problem areas • Explain decisions • Github for examples:
  4. [email protected] @IanOzsvald[.com] PyDataLondon 2018 TSNE by features Features for this

    cluster - lots of imputed ages! We’ve filtered by x, y region on the TSNE plot
  5. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Last mentions • skopt’s BayesSearchCV perhaps

    “beats” RandomizedSearchCV & GridSearchCV • New iteration of this talk for PyDataAmsterdam 2018 in 1 month (with SHAP replacing ELI5 + other tools) • If you can’t reliably explain why a prediction happens – do you really understand your model?
  6. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Closing... • Diagnose your ML just

    like you debug your code – explain its working to colleagues • Do you want training on topics like this? • Write-up + more: http://ianozsvald.com/ • Questions in exchange for beer :-) • Learnt something? Please send me a postcard! • See my longer diagnosis Notebook on github:
  7. [email protected] @IanOzsvald[.com] PyDataLondon 2018 Appendix • Ian’s “Machine Learning Libraries

    You’d Wish You’d Knew” @ PyConUK 2017 • Ian’s “Using Machine Learning to solve a classification problem with scikit-learn” @ PyConUK 2016 • Gael Varoquaux’s tutorial “Understanding and diagnosing your machine-learning models” @ PyDataLondon 2018 http://gael-varoquaux.info/interpreting_ml_tuto/ • Also see Kat Jarmul’s keynote @ PyDataWarsaw 2017: https://blog.kjamistan.com/towards-interpretable-reliable-model s • Michał Łopuszyński @ PyDataWarsaw https://www.slideshare.net/lopusz/debugging-machinelearning