Slide 1

Slide 1 text

Data Science Project Patterns that Work @IanOzsvald – ianozsvald.com Ian Ozsvald PyDataGlobal 2022

Slide 2

Slide 2 text

•For teams who have trouble shipping valuable results •I show common situations that make failures more likely – and how to fix these with success patterns •Using these ideas I’ve helped clients make $M •Take these back to your team+boss for discussion Patterns for Success By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 3

Slide 3 text

Design Patterns By [ian]@ianozsvald[.com] Ian Ozsvald https://en.wikipedia.org/wiki/Design_pattern

Slide 4

Slide 4 text

Anti-Patterns By [ian]@ianozsvald[.com] Ian Ozsvald https://en.wikipedia.org/wiki/Anti-pattern

Slide 5

Slide 5 text

 Interim Chief Data Scientist  20+ years experience  Strategy, CTO coaching & public courses –I’m sharing from my Successful Data Science Projects course Introductions By [ian]@ianozsvald[.com] Ian Ozsvald 2nd Edition!

Slide 6

Slide 6 text

•We’re automating an existing process •We need a high chance of positive results, accepting high uncertainty •Let’s sell a big outcome, ignore the risks and hope! (NO!) •“Walking from the Box to the database” Which Project to Choose? By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 7

Slide 7 text

•Problem – Many uncertainties •Solution – Derisk – Who uses it? How do they use it? What’s valuable to them? What’s the right metric? Why should they trust it? Do stakeholders have time? •Antipattern – “push on, don’t worry about the risks” PATTERN Choosing Good Projects By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 8

Slide 8 text

Choosing Good Projects By [ian]@ianozsvald[.com] Ian Ozsvald What’s the $value?

Slide 9

Slide 9 text

•“The data’s in the DB, silly!” (did anyone ever look at it?) •Does the data represent a useful reality? Do outsourcers fill in the same data each year? “New CRM next year…” •We need to know if the data is sane and stable What Does the Data Mean? By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 10

Slide 10 text

By [ian]@ianozsvald[.com] Ian Ozsvald PLEASE POST YOUR EDA/CHECKING LIBRARIES INTO DISCORD Stable Data Sane Data

Slide 11

Slide 11 text

•Problem – GBs of data but no business understanding •Solution – Business Collaboration – How does the business operate? Why does it make this data? Draw diagrams. Is it stable? Do stakeholders trust their data? •Antipattern – “just exploit the data, that’s the truth” PATTERN Derisking Data By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 12

Slide 12 text

•We need to use our time efficiently to show progress •I’ve burned a lot of time making poor decisions (solution – go for a walk) •What next best decision gets you closer to shipping? •“I’ve got 10 ideas, I’ll try them first and diagnose later” – don’t be that person Researching By [ian]@ianozsvald[.com] Ian Ozsvald PLEASE POST YOUR DIAGNOSTIC LIBRARIES INTO DISCORD

Slide 13

Slide 13 text

•What does it get most wrong? Why? •Brainstorm solutions, iterate •Have a boring, predictable process Productive Research By [ian]@ianozsvald[.com] Ian Ozsvald https://marcotcr.github.io/lime/tutorials/Lime%20-%20multiclass.html Domains help differentiate 20 newsgroup challenge? That won’t generalize! Great score though... Does expert like the negative weighting? https://github.com/slundberg/shap/

Slide 14

Slide 14 text

•Problem – lots of data, lots of ideas, what to do first? •Solution - Repeatable process – benchmarks+baseline models, diagnosis, iteration •Antipattern – “Hack away at lots of ideas! Ignore git! Keep it all in the same folder! Copy/paste between Notebooks” PATTERN Productive Research By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 15

Slide 15 text

•A demo on your laptop != a useful deployed system “I’m sure it’ll all work when we tie the Notebooks together...” •“IT need 6 months, schema please?” •Plan for updates? Feedback? •“if/then/else” rules totally cool as a v1 When do I make a delivery? By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 16

Slide 16 text

•Problem – Need to get results to the users •Solution - Iterative deployment – quickly ship something basic, keep shipping better systems. CSV/Excel lists? Integration to app? New tags in db? •Antipattern – “deployment is someone else’s problem” PATTERN Deliver Value Early By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 17

Slide 17 text

•Now we want to impact our user – do they trust us? •Use those diagnostics! Tailor from ML to the user’s need “we could explain RF or we could use a SHAP diagram” •When your human says “actually the colleague is wrong and the machine is right” – you’ve made it “Build it and they (won’t) come” By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 18

Slide 18 text

•Problem – Trying your solution is a risk for the user •Solution - Build trust – iteration and progress builds trust and you learn critical lessons about the data, process and value •Antipattern – “focus on the ML metrics, ignore the client” PATTERN Creating Change By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 19

Slide 19 text

•Choose Good Projects •Derisk (the meaning of your) Data •(Do) Productive Research •Deliver Value Early •Create Change PATTERNS By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 20

Slide 20 text

By [ian]@ianozsvald[.com] Ian Ozsvald 3-5pm UTC today, highly participatory, with Lauren Oldja and special- guest Douglas Squirrel You should probably join this

Slide 21

Slide 21 text

 Newsletter:  See blog for my classes + many past talks  I’d love a postcard if you learned something new! Summary By [ian]@ianozsvald[.com] Ian Ozsvald 3-5pm today

Slide 22

Slide 22 text

•Keep asking “are we asking good questions?” Final tip By [ian]@ianozsvald[.com] Ian Ozsvald