Upgrade to Pro — share decks privately, control downloads, hide ads and more …

On the Delivery of Data Science Projects

February 25, 2019

On the Delivery of Data Science Projects

Talk at Business, Analytics and Data Science (2019-02) based on my training course on the topics you should focus on to improve the deliverability and impact of your data science projects: https://www.meetup.com/Business-Analytics-and-Data-Science/events/258531525/


February 25, 2019

More Decks by ianozsvald

Other Decks in Technology


  1. On the Delivery of Data Science Projects @IanOzsvald – ianozsvald.com

    Ian Ozsvald Business, Analytics and Data Science meetup 2019-02
  2.  Interim Chief Data Scientist  19+ years experience 

    Quickly build strategic data science plans  Team coaching & public courses Introductions By [ian]@ianozsvald[.com] Ian Ozsvald
  3.  Numerate management ask good data-driven questions  You have

    suitable data  Well defined achievable outcomes are defined  Change is enabled by these projects Data Science shows value when... By [ian]@ianozsvald[.com] Ian Ozsvald
  4.  “Make us more [money/…]” - give me magic! 

    Desire over need – vanity projects!  Lack of technical leadership – poor/missing specs  Bad data – lies, mistakes and confusion  Lack of client buy-in – no burning need Common delivery problems By [ian]@ianozsvald[.com] Ian Ozsvald
  5.  States a clearly defined problem  Guesses at unknowns

    (and project torpedoes!)  Proposed milestones and Gold Standard/metrics  Clear “definition of done”  Story from 10 years back You need a Project Specification By [ian]@ianozsvald[.com] Ian Ozsvald
  6.  Do you understand your data? – What’s good and

    bad? – What relationships exist?  Build exportable Notebook as html artefact  Read Bertil’s piece on Medium “Data Story” By [ian]@ianozsvald[.com] Ian Ozsvald
  7.  Reduce mental load for common decisions – Cookiecutter data-science

    – Watermark – Pandas-profiling – Anaconda Standardised Approaches By [ian]@ianozsvald[.com] Ian Ozsvald
  8.  Encode assumptions using asserts  Refactor to modules 

    Add unit-tests  Visual reports with analyst interpretations  Diagnostics e.g. yellowbrick for sklearn Continuously improving code quality By [ian]@ianozsvald[.com] Ian Ozsvald
  9.  Code review (with a check-list & PEP8)  nbdime

    for diffs  “Data Defences” - regular critiques by colleagues on your project Continuously improving project quality By [ian]@ianozsvald[.com] Ian Ozsvald
  10.  Exposure to new processes  Enforced clear communication 

    Balanced consumption & contribution  You’re more visible & valuable Contributing to Open Source gets you By [ian]@ianozsvald[.com] Ian Ozsvald
  11.  Easy first deliveries – reports  Get to a

    minimal working delivery as soon as possible  Consider papermill for deployable Notebooks Continuous delivery to clients By [ian]@ianozsvald[.com] Ian Ozsvald
  12. My “Successfully Delivering Data Science Projects” course – sold out

    – join my training list via ianozsvald.com Resources By [ian]@ianozsvald[.com] Ian Ozsvald
  13.  Derisk early and often  Communicate visually, all the

    time  Honesty throughout your work  Strive to continuous improvement  Consider speaking at PyDataLondon 2019 July 12-14 Summary By [ian]@ianozsvald[.com] Ian Ozsvald