Upgrade to Pro — share decks privately, control downloads, hide ads and more …

On the Delivery of Data Science Projects

ianozsvald
February 25, 2019

On the Delivery of Data Science Projects

Talk at Business, Analytics and Data Science (2019-02) based on my training course on the topics you should focus on to improve the deliverability and impact of your data science projects: https://www.meetup.com/Business-Analytics-and-Data-Science/events/258531525/

ianozsvald

February 25, 2019
Tweet

More Decks by ianozsvald

Other Decks in Technology

Transcript

  1. On the Delivery of Data Science
    Projects
    @IanOzsvald – ianozsvald.com
    Ian Ozsvald
    Business, Analytics and Data Science meetup 2019-02

    View Slide


  2. Interim Chief Data Scientist

    19+ years experience

    Quickly build strategic data science plans

    Team coaching & public courses
    Introductions
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  3. Numerate management ask good data-driven questions

    You have suitable data

    Well defined achievable outcomes are defined

    Change is enabled by these projects
    Data Science shows value when...
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  4. “Make us more [money/…]” - give me magic!

    Desire over need – vanity projects!

    Lack of technical leadership – poor/missing specs

    Bad data – lies, mistakes and confusion

    Lack of client buy-in – no burning need
    Common delivery problems
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  5. Audience – your observations?
    What problems have you seen?
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  6. States a clearly defined problem

    Guesses at unknowns (and project torpedoes!)

    Proposed milestones and Gold Standard/metrics

    Clear “definition of done”

    Story from 10 years back
    You need a Project Specification
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  7. Do you understand your data?
    – What’s good and bad?
    – What relationships exist?

    Build exportable Notebook as html artefact

    Read Bertil’s piece on Medium
    “Data Story”
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  8. Reduce mental load for common decisions
    – Cookiecutter data-science
    – Watermark
    – Pandas-profiling
    – Anaconda
    Standardised Approaches
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  9. Code quality
    By [ian]@ianozsvald[.com] Ian Ozsvald
    Attrib: https://devrant.com/rants/347670/code-quality-as-measured-in-wtfs-minute

    View Slide


  10. Encode assumptions using asserts

    Refactor to modules

    Add unit-tests

    Visual reports with analyst interpretations

    Diagnostics e.g. yellowbrick for sklearn
    Continuously improving code quality
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  11. Code review (with a check-list & PEP8)

    nbdime for diffs

    “Data Defences” - regular critiques by colleagues on your
    project
    Continuously improving project
    quality
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  12. Exposure to new processes

    Enforced clear communication

    Balanced consumption & contribution

    You’re more visible & valuable
    Contributing to Open Source gets you
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  13. Easy first deliveries – reports

    Get to a minimal working delivery as soon as
    possible

    Consider papermill for deployable Notebooks
    Continuous delivery to clients
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide

  14. My “Successfully Delivering
    Data Science Projects” course
    – sold out
    – join my training list via
    ianozsvald.com
    Resources
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide


  15. Derisk early and often

    Communicate visually, all the time

    Honesty throughout your work

    Strive to continuous improvement

    Consider speaking at PyDataLondon 2019 July 12-14
    Summary
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View Slide