Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science Projects Patterns that Work

ianozsvald
December 01, 2022

Data Science Projects Patterns that Work

Given at PyDataGlobal 2022, this talk outlines 5 patterns to get an idea through to a useful and deployed end-project that offers value to the intended audience. These patterns are based on my experiences giving strategic support to teams who have trouble shipping data science solutions.

* https://global2022.pydata.org/cfp/talk/9GYEJB/
* https://ianozsvald.com/

ianozsvald

December 01, 2022
Tweet

More Decks by ianozsvald

Other Decks in Technology

Transcript

  1. Data Science Project Patterns that Work @IanOzsvald – ianozsvald.com Ian

    Ozsvald PyDataGlobal 2022
  2. •For teams who have trouble shipping valuable results •I show

    common situations that make failures more likely – and how to fix these with success patterns •Using these ideas I’ve helped clients make $M •Take these back to your team+boss for discussion Patterns for Success By [ian]@ianozsvald[.com] Ian Ozsvald
  3. Design Patterns By [ian]@ianozsvald[.com] Ian Ozsvald https://en.wikipedia.org/wiki/Design_pattern

  4. Anti-Patterns By [ian]@ianozsvald[.com] Ian Ozsvald https://en.wikipedia.org/wiki/Anti-pattern

  5.  Interim Chief Data Scientist  20+ years experience 

    Strategy, CTO coaching & public courses –I’m sharing from my Successful Data Science Projects course Introductions By [ian]@ianozsvald[.com] Ian Ozsvald 2nd Edition!
  6. •We’re automating an existing process •We need a high chance

    of positive results, accepting high uncertainty •Let’s sell a big outcome, ignore the risks and hope! (NO!) •“Walking from the Box to the database” Which Project to Choose? By [ian]@ianozsvald[.com] Ian Ozsvald
  7. •Problem – Many uncertainties •Solution – Derisk – Who uses

    it? How do they use it? What’s valuable to them? What’s the right metric? Why should they trust it? Do stakeholders have time? •Antipattern – “push on, don’t worry about the risks” PATTERN Choosing Good Projects By [ian]@ianozsvald[.com] Ian Ozsvald
  8. Choosing Good Projects By [ian]@ianozsvald[.com] Ian Ozsvald What’s the $value?

  9. •“The data’s in the DB, silly!” (did anyone ever look

    at it?) •Does the data represent a useful reality? Do outsourcers fill in the same data each year? “New CRM next year…” •We need to know if the data is sane and stable What Does the Data Mean? By [ian]@ianozsvald[.com] Ian Ozsvald
  10. By [ian]@ianozsvald[.com] Ian Ozsvald PLEASE POST YOUR EDA/CHECKING LIBRARIES INTO

    DISCORD Stable Data Sane Data
  11. •Problem – GBs of data but no business understanding •Solution

    – Business Collaboration – How does the business operate? Why does it make this data? Draw diagrams. Is it stable? Do stakeholders trust their data? •Antipattern – “just exploit the data, that’s the truth” PATTERN Derisking Data By [ian]@ianozsvald[.com] Ian Ozsvald
  12. •We need to use our time efficiently to show progress

    •I’ve burned a lot of time making poor decisions (solution – go for a walk) •What next best decision gets you closer to shipping? •“I’ve got 10 ideas, I’ll try them first and diagnose later” – don’t be that person Researching By [ian]@ianozsvald[.com] Ian Ozsvald PLEASE POST YOUR DIAGNOSTIC LIBRARIES INTO DISCORD
  13. •What does it get most wrong? Why? •Brainstorm solutions, iterate

    •Have a boring, predictable process Productive Research By [ian]@ianozsvald[.com] Ian Ozsvald https://marcotcr.github.io/lime/tutorials/Lime%20-%20multiclass.html Domains help differentiate 20 newsgroup challenge? That won’t generalize! Great score though... Does expert like the negative weighting? https://github.com/slundberg/shap/
  14. •Problem – lots of data, lots of ideas, what to

    do first? •Solution - Repeatable process – benchmarks+baseline models, diagnosis, iteration •Antipattern – “Hack away at lots of ideas! Ignore git! Keep it all in the same folder! Copy/paste between Notebooks” PATTERN Productive Research By [ian]@ianozsvald[.com] Ian Ozsvald
  15. •A demo on your laptop != a useful deployed system

    “I’m sure it’ll all work when we tie the Notebooks together...” •“IT need 6 months, schema please?” •Plan for updates? Feedback? •“if/then/else” rules totally cool as a v1 When do I make a delivery? By [ian]@ianozsvald[.com] Ian Ozsvald
  16. •Problem – Need to get results to the users •Solution

    - Iterative deployment – quickly ship something basic, keep shipping better systems. CSV/Excel lists? Integration to app? New tags in db? •Antipattern – “deployment is someone else’s problem” PATTERN Deliver Value Early By [ian]@ianozsvald[.com] Ian Ozsvald
  17. •Now we want to impact our user – do they

    trust us? •Use those diagnostics! Tailor from ML to the user’s need “we could explain RF or we could use a SHAP diagram” •When your human says “actually the colleague is wrong and the machine is right” – you’ve made it “Build it and they (won’t) come” By [ian]@ianozsvald[.com] Ian Ozsvald
  18. •Problem – Trying your solution is a risk for the

    user •Solution - Build trust – iteration and progress builds trust and you learn critical lessons about the data, process and value •Antipattern – “focus on the ML metrics, ignore the client” PATTERN Creating Change By [ian]@ianozsvald[.com] Ian Ozsvald
  19. •Choose Good Projects •Derisk (the meaning of your) Data •(Do)

    Productive Research •Deliver Value Early •Create Change PATTERNS By [ian]@ianozsvald[.com] Ian Ozsvald
  20. By [ian]@ianozsvald[.com] Ian Ozsvald 3-5pm UTC today, highly participatory, with

    Lauren Oldja and special- guest Douglas Squirrel You should probably join this
  21.  Newsletter:  See blog for my classes + many

    past talks  I’d love a postcard if you learned something new! Summary By [ian]@ianozsvald[.com] Ian Ozsvald 3-5pm today
  22. •Keep asking “are we asking good questions?” Final tip By

    [ian]@ianozsvald[.com] Ian Ozsvald