Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Defly Delivering Data Science Projects

ianozsvald
November 11, 2018

Defly Delivering Data Science Projects

Battle tested observations on ways to improve the likelihood that your data science project goes smoothly and gets delivered correctly. Given at the inaugural PyDataPrague.

ianozsvald

November 11, 2018
Tweet

More Decks by ianozsvald

Other Decks in Technology

Transcript

  1. Deftly Delivering Data Science
    Projects
    PyDataPrague 2018-10
    Ian Ozsvald @IanOzsvald ModelInsight.io

    View full-size slide

  2. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Introductions

    I’m an engineering data scientist

    15+ years experience

    Team coaching

    Strategic planning

    Training

    View full-size slide

  3. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Problems delivering DS projects

    What are your experiences?

    View full-size slide

  4. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Problems delivering DS projects

    “Make us more [money|signups|...]” -
    desire for magic

    Desire over actual need – vanity projects

    Lack of technical leadership – poor specs

    Bad data – lies, mistakes, confusion

    Lack of client buy-in

    View full-size slide

  5. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Good DS projects

    Numerate management asking good
    data driven questions

    Suitable data

    Well defined outcomes that are agreed to
    be achievable

    View full-size slide

  6. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Learning and applying at...

    View full-size slide

  7. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Project Specification

    You need a clearly defined problem

    Where are the unknowns?

    Known unknowns

    What might kill the project?

    Propose milestones

    Where’s your Gold Standard data set?

    What’s your “definition of done”

    Minimal results and great results

    Appropriate metrics to communicate results

    View full-size slide

  8. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    “Data story”

    Do you understand your data?

    Explain your data – what does it say?

    What’s good and what’s bad?

    What are the relationships?

    Where is the signal in the data?

    Export your Notebook as html artefact

    Data Story proposed by Bertil (Medium)

    View full-size slide

  9. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Standardised approaches

    Reduce the mental load for common
    decisions

    Cookiecutter (folders)

    pandas-profiling

    watermark

    Anaconda

    View full-size slide

  10. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Improving code quality

    Encode assumptions with asserts

    Refactor to modules

    Add unit-tests

    Visual reports with analyst interpretations

    Diagnostics e.g. yellowbrick for sklearn

    View full-size slide

  11. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Improving project quality

    Code reviews (with a check-list, PEP8)

    nbdime for diffs

    “Data Defences” - regular critiques by
    colleagues on your project

    View full-size slide

  12. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Continuous delivery to client

    Early deliveries – reports

    Get to a minimal working delivery as
    soon as possible (UI? App? Reports?)

    Consider papermill for deployable
    Notebooks

    View full-size slide

  13. [email protected] @IanOzsvald[.com]
    PyDataPrague 2018-10
    Summary

    Honesty throughout your work

    Strive to keep improving your technique

    Keep communicating your results

    View full-size slide