On the Delivery of Data Science Projects

3d644406158b4d440111903db1f62622?s=47 ianozsvald
February 25, 2019

On the Delivery of Data Science Projects

Talk at Business, Analytics and Data Science (2019-02) based on my training course on the topics you should focus on to improve the deliverability and impact of your data science projects: https://www.meetup.com/Business-Analytics-and-Data-Science/events/258531525/

3d644406158b4d440111903db1f62622?s=128

ianozsvald

February 25, 2019
Tweet

Transcript

  1. On the Delivery of Data Science Projects @IanOzsvald – ianozsvald.com

    Ian Ozsvald Business, Analytics and Data Science meetup 2019-02
  2.  Interim Chief Data Scientist  19+ years experience 

    Quickly build strategic data science plans  Team coaching & public courses Introductions By [ian]@ianozsvald[.com] Ian Ozsvald
  3.  Numerate management ask good data-driven questions  You have

    suitable data  Well defined achievable outcomes are defined  Change is enabled by these projects Data Science shows value when... By [ian]@ianozsvald[.com] Ian Ozsvald
  4.  “Make us more [money/…]” - give me magic! 

    Desire over need – vanity projects!  Lack of technical leadership – poor/missing specs  Bad data – lies, mistakes and confusion  Lack of client buy-in – no burning need Common delivery problems By [ian]@ianozsvald[.com] Ian Ozsvald
  5.  Audience – your observations? What problems have you seen?

    By [ian]@ianozsvald[.com] Ian Ozsvald
  6.  States a clearly defined problem  Guesses at unknowns

    (and project torpedoes!)  Proposed milestones and Gold Standard/metrics  Clear “definition of done”  Story from 10 years back You need a Project Specification By [ian]@ianozsvald[.com] Ian Ozsvald
  7.  Do you understand your data? – What’s good and

    bad? – What relationships exist?  Build exportable Notebook as html artefact  Read Bertil’s piece on Medium “Data Story” By [ian]@ianozsvald[.com] Ian Ozsvald
  8.  Reduce mental load for common decisions – Cookiecutter data-science

    – Watermark – Pandas-profiling – Anaconda Standardised Approaches By [ian]@ianozsvald[.com] Ian Ozsvald
  9. Code quality By [ian]@ianozsvald[.com] Ian Ozsvald Attrib: https://devrant.com/rants/347670/code-quality-as-measured-in-wtfs-minute

  10.  Encode assumptions using asserts  Refactor to modules 

    Add unit-tests  Visual reports with analyst interpretations  Diagnostics e.g. yellowbrick for sklearn Continuously improving code quality By [ian]@ianozsvald[.com] Ian Ozsvald
  11.  Code review (with a check-list & PEP8)  nbdime

    for diffs  “Data Defences” - regular critiques by colleagues on your project Continuously improving project quality By [ian]@ianozsvald[.com] Ian Ozsvald
  12.  Exposure to new processes  Enforced clear communication 

    Balanced consumption & contribution  You’re more visible & valuable Contributing to Open Source gets you By [ian]@ianozsvald[.com] Ian Ozsvald
  13.  Easy first deliveries – reports  Get to a

    minimal working delivery as soon as possible  Consider papermill for deployable Notebooks Continuous delivery to clients By [ian]@ianozsvald[.com] Ian Ozsvald
  14. My “Successfully Delivering Data Science Projects” course – sold out

    – join my training list via ianozsvald.com Resources By [ian]@ianozsvald[.com] Ian Ozsvald
  15.  Derisk early and often  Communicate visually, all the

    time  Honesty throughout your work  Strive to continuous improvement  Consider speaking at PyDataLondon 2019 July 12-14 Summary By [ian]@ianozsvald[.com] Ian Ozsvald