On the Diagramatic Diagnosis of Data (BuapestBI 2018)

Diagramatic Diagnosis of Data BudapestBI 2018 Ian Ozsvald @IanOzsvald ModelInsight.io

[email protected] @IanOzsvald[.com] BudapestBI 2018 Introductions • I’m an engineering data
scientist • Consulting in AI + Data Science for 15+ years Blog->IanOzsvald.com

[email protected] @IanOzsvald[.com] BudapestBI 2018 Community Announcement! • Have you thanked
a speaker, an organiser or Bence yet? Lots of volunteered time – please say thanks • Thank contributors too! • Did I take a photo?

[email protected] @IanOzsvald[.com] BudapestBI 2018 Goals today • How long since
you had brand new data? • Univariate investigations • Show relationships with seaborn • discover_feature_relationships – my new tool (feedback please!) • Data stories

[email protected] @IanOzsvald[.com] BudapestBI 2018 pandas_profiling (Titanic)

[email protected] @IanOzsvald[.com] BudapestBI 2018 pandas_profiling

[email protected] @IanOzsvald[.com] BudapestBI 2018 Describing Titanic relationships

[email protected] @IanOzsvald[.com] BudapestBI 2018 Seaborn (Titanic data) Non-formatted default pivot
result

[email protected] @IanOzsvald[.com] BudapestBI 2018 Seaborn (Titanic data)

[email protected] @IanOzsvald[.com] BudapestBI 2018 Discovering relationships • Project is on
GitHub • Shows correlations and machine learned relationships for all feature pairs • RandomForest + cross validation + some assumptions • Categories encoded->Labels • Feedback please!

[email protected] @IanOzsvald[.com] BudapestBI 2018 Spearman feature correlations

[email protected] @IanOzsvald[.com] BudapestBI 2018 Discovering relationships

[email protected] @IanOzsvald[.com] BudapestBI 2018 Pandas scatter LSTAT vs MEDV

[email protected] @IanOzsvald[.com] BudapestBI 2018 Seaborn JointGrid with alpha

[email protected] @IanOzsvald[.com] BudapestBI 2018 Seaborn hex jointplot

[email protected] @IanOzsvald[.com] BudapestBI 2018 Diff the upper and lower triangles
CRIM vs RAD might be interesting as both sides had "some" predictive power

[email protected] @IanOzsvald[.com] BudapestBI 2018 A non-symmetric relationship CRIM can predict
RAD but RAD poorly predicts CRIM So maybe we need better data?

[email protected] @IanOzsvald[.com] BudapestBI 2018 NetworkX to show relationships Who predicts
MEDV directly and indirectly? What new data might we try to get, given these relationships?

[email protected] @IanOzsvald[.com] BudapestBI 2018 Data Stories (@bertil_hatt) • A short
report describing the data and proposing things we could do with it • Stuff we trust (or don’t) • Interesting or unexpected relationships • Missing data (e.g. missingno) • Propose experiments that we might run on this data which generate a benefit • Document, don’t forget! https://medium.com/@bertil_hatt/what-does-bad-data-look-like-91dc2a7bcb7a

[email protected] @IanOzsvald[.com] BudapestBI 2018 Conclusions • Visualise and communicate all
of your data relationships • Visit PyDataLondon 2019 :-) • I’d love a postcard if you learned something? • See more examples: https://github.com/ianozsvald

On the Diagramatic Diagnosis of Data (BuapestBI...

On the Diagramatic Diagnosis of Data (BuapestBI 2018)

ianozsvald

More Decks by ianozsvald

Other Decks in Science

Featured

Transcript

Diagramatic Diagnosis of Data BudapestBI 2018 Ian Ozsvald @IanOzsvald ModelInsight.io

[email protected] @IanOzsvald[.com] BudapestBI 2018 Introductions • I’m an engineering data

[email protected] @IanOzsvald[.com] BudapestBI 2018 Community Announcement! • Have you thanked

[email protected] @IanOzsvald[.com] BudapestBI 2018 Goals today • How long since

[email protected] @IanOzsvald[.com] BudapestBI 2018 pandas_profiling (Titanic)

[email protected] @IanOzsvald[.com] BudapestBI 2018 pandas_profiling

[email protected] @IanOzsvald[.com] BudapestBI 2018 Describing Titanic relationships

[email protected] @IanOzsvald[.com] BudapestBI 2018 Seaborn (Titanic data) Non-formatted default pivot

[email protected] @IanOzsvald[.com] BudapestBI 2018 Seaborn (Titanic data)

[email protected] @IanOzsvald[.com] BudapestBI 2018 Seaborn (Titanic data)

[email protected] @IanOzsvald[.com] BudapestBI 2018 Discovering relationships • Project is on

[email protected] @IanOzsvald[.com] BudapestBI 2018 Spearman feature correlations

[email protected] @IanOzsvald[.com] BudapestBI 2018 Discovering relationships

[email protected] @IanOzsvald[.com] BudapestBI 2018 Pandas scatter LSTAT vs MEDV

[email protected] @IanOzsvald[.com] BudapestBI 2018 Seaborn JointGrid with alpha

[email protected] @IanOzsvald[.com] BudapestBI 2018 Seaborn hex jointplot

[email protected] @IanOzsvald[.com] BudapestBI 2018 Diff the upper and lower triangles

[email protected] @IanOzsvald[.com] BudapestBI 2018 A non-symmetric relationship CRIM can predict

[email protected] @IanOzsvald[.com] BudapestBI 2018 NetworkX to show relationships Who predicts

[email protected] @IanOzsvald[.com] BudapestBI 2018 Data Stories (@bertil_hatt) • A short

[email protected] @IanOzsvald[.com] BudapestBI 2018 Conclusions • Visualise and communicate all