Creating Correct Classifiers
PyDataAmsterdam 2018
Ian Ozsvald @IanOzsvald ModelInsight.io
Slide 2
Slide 2 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
Introductions
●
I’m an engineering data scientist
●
Consulting in AI + Data Science for 15+
years
Blog->IanOzsvald.com
Slide 3
Slide 3 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
NumFOCUS
●
Have you thanked a speaker, a volunteer
and a NumFOCUS organiser yet? Lots of
volunteered time – please say thanks
●
Thank contributors too!
Slide 4
Slide 4 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
Goals today
●
Get a baseline model
●
Visualise errors & diagnose problem
areas
●
Explain decisions
●
Github for examples:
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
ROC Curve (YellowBrick)
LogisticRegression classifier to show a contrast
with lower AUC
Slide 16
Slide 16 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
Worst Errors by Row
Slide 17
Slide 17 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
Worst Errors by Row
Slide 18
Slide 18 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
Errors by Major Feature
Slide 19
Slide 19 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
TSNE by features
Slide 20
Slide 20 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
TSNE by features
Oddly similar cluster?
Conflicted?
Slide 21
Slide 21 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
TSNE by features
Features for this cluster - lots of imputed ages!
We’ve filtered by x, y region on the TSNE plot
Slide 22
Slide 22 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
Examine conflicted area
Oddly similar cluster?
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
Closing...
●
Diagnose your ML just like you debug your code –
explain its working to colleagues
●
Do you want training on topics like this?
●
Write-up + more: http://ianozsvald.com/
●
Questions in exchange for beer :-)
●
Learnt something? Please send me a postcard!
●
See my longer diagnosis
Notebook on github:
Slide 28
Slide 28 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
Appendix
●
Ian’s “Machine Learning Libraries You’d Wish You’d Knew” @
PyConUK 2017
●
Ian’s “Using Machine Learning to solve a classification problem
with scikit-learn” @ PyConUK 2016
●
Gael Varoquaux’s tutorial “Understanding and diagnosing your
machine-learning models” @ PyDataLondon 2018
http://gael-varoquaux.info/interpreting_ml_tuto/
●
Also see Kat Jarmul’s keynote @ PyDataWarsaw 2017:
https://blog.kjamistan.com/towards-interpretable-reliable-model
s
●
Michał Łopuszyński @ PyDataWarsaw
https://www.slideshare.net/lopusz/debugging-machinelearning
Slide 29
Slide 29 text
Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com]
PyDataAmsterdam 2018
ROC Curve (YellowBrick)
LogisticRegression classifier to show a contrast
with lower AUC