Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Science Projects Patterns that Work

ianozsvald
December 01, 2022

Data Science Projects Patterns that Work

Given at PyDataGlobal 2022, this talk outlines 5 patterns to get an idea through to a useful and deployed end-project that offers value to the intended audience. These patterns are based on my experiences giving strategic support to teams who have trouble shipping data science solutions.

* https://global2022.pydata.org/cfp/talk/9GYEJB/
* https://ianozsvald.com/

ianozsvald

December 01, 2022
Tweet

More Decks by ianozsvald

Other Decks in Technology

Transcript

  1. Data Science Project Patterns that
    Work
    @IanOzsvald – ianozsvald.com
    Ian Ozsvald
    PyDataGlobal 2022

    View Slide

  2. •For teams who have trouble shipping valuable results
    •I show common situations that make failures more likely
    – and how to fix these with success patterns
    •Using these ideas I’ve helped clients make $M
    •Take these back to your team+boss for discussion
    Patterns for Success
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  3. Design Patterns
    By [ian]@ianozsvald[.com] Ian Ozsvald
    https://en.wikipedia.org/wiki/Design_pattern

    View Slide

  4. Anti-Patterns
    By [ian]@ianozsvald[.com] Ian Ozsvald
    https://en.wikipedia.org/wiki/Anti-pattern

    View Slide


  5. Interim Chief Data Scientist

    20+ years experience

    Strategy, CTO coaching & public courses
    –I’m sharing from my Successful Data
    Science Projects course
    Introductions
    By [ian]@ianozsvald[.com] Ian Ozsvald
    2nd
    Edition!

    View Slide

  6. •We’re automating an existing process
    •We need a high chance of positive results, accepting high
    uncertainty
    •Let’s sell a big outcome, ignore the risks and hope! (NO!)
    •“Walking from the Box to the database”
    Which Project to Choose?
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  7. •Problem – Many uncertainties
    •Solution – Derisk – Who uses it? How do they use it?
    What’s valuable to them? What’s the right metric? Why
    should they trust it? Do stakeholders have time?
    •Antipattern – “push on, don’t worry about the risks”
    PATTERN Choosing Good Projects
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  8. Choosing Good Projects
    By [ian]@ianozsvald[.com] Ian Ozsvald
    What’s the $value?

    View Slide

  9. •“The data’s in the DB, silly!” (did anyone ever look at it?)
    •Does the data represent a useful reality? Do outsourcers
    fill in the same data each year? “New CRM next year…”
    •We need to know if the data is sane and stable
    What Does the Data Mean?
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  10. By [ian]@ianozsvald[.com] Ian Ozsvald
    PLEASE POST YOUR EDA/CHECKING
    LIBRARIES INTO DISCORD
    Stable Data
    Sane Data

    View Slide

  11. •Problem – GBs of data but no business understanding
    •Solution – Business Collaboration – How does the
    business operate? Why does it make this data? Draw
    diagrams. Is it stable? Do stakeholders trust their data?
    •Antipattern – “just exploit the data, that’s the truth”
    PATTERN Derisking Data
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  12. •We need to use our time efficiently to show progress
    •I’ve burned a lot of time making poor decisions (solution –
    go for a walk)
    •What next best decision gets you closer to shipping?
    •“I’ve got 10 ideas, I’ll try them first and diagnose later” –
    don’t be that person
    Researching
    By [ian]@ianozsvald[.com] Ian Ozsvald
    PLEASE POST YOUR DIAGNOSTIC
    LIBRARIES INTO DISCORD

    View Slide

  13. •What does it get most
    wrong? Why?
    •Brainstorm solutions, iterate
    •Have a boring, predictable
    process
    Productive Research
    By [ian]@ianozsvald[.com] Ian Ozsvald
    https://marcotcr.github.io/lime/tutorials/Lime%20-%20multiclass.html
    Domains help differentiate 20
    newsgroup challenge? That
    won’t generalize! Great score
    though...
    Does expert like the negative weighting?
    https://github.com/slundberg/shap/

    View Slide

  14. •Problem – lots of data, lots of ideas, what to do first?
    •Solution - Repeatable process – benchmarks+baseline
    models, diagnosis, iteration
    •Antipattern – “Hack away at lots of ideas! Ignore git! Keep it
    all in the same folder! Copy/paste between Notebooks”
    PATTERN Productive Research
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  15. •A demo on your laptop != a useful deployed system “I’m
    sure it’ll all work when we tie the Notebooks together...”
    •“IT need 6 months, schema please?”
    •Plan for updates? Feedback?
    •“if/then/else” rules totally cool as a v1
    When do I make a delivery?
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  16. •Problem – Need to get results to the users
    •Solution - Iterative deployment – quickly ship something
    basic, keep shipping better systems. CSV/Excel lists?
    Integration to app? New tags in db?
    •Antipattern – “deployment is someone else’s problem”
    PATTERN Deliver Value Early
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  17. •Now we want to impact our user – do they trust us?
    •Use those diagnostics! Tailor from ML to the user’s need
    “we could explain RF or we could use a SHAP diagram”
    •When your human says “actually the colleague is wrong
    and the machine is right” – you’ve made it
    “Build it and they (won’t) come”
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  18. •Problem – Trying your solution is a risk for the user
    •Solution - Build trust – iteration and progress builds trust
    and you learn critical lessons about the data, process and
    value
    •Antipattern – “focus on the ML metrics, ignore the client”
    PATTERN Creating Change
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  19. •Choose Good Projects
    •Derisk (the meaning of your) Data
    •(Do) Productive Research
    •Deliver Value Early
    •Create Change
    PATTERNS
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  20. By [ian]@ianozsvald[.com] Ian Ozsvald
    3-5pm UTC today,
    highly participatory, with
    Lauren Oldja and special-
    guest Douglas Squirrel
    You should probably
    join this

    View Slide


  21. Newsletter:

    See blog for my classes + many past talks

    I’d love a postcard if you learned something
    new!
    Summary
    By [ian]@ianozsvald[.com] Ian Ozsvald
    3-5pm today

    View Slide

  22. •Keep asking “are we asking good questions?”
    Final tip
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide