On the Delivery of Data Science Projects

On the Delivery of Data Science Projects

Talk at PyDataCambridge 2019-05 on the two halves of Data Science delivery - derisking the business side of the project and improving the software engineering side of the project. Contains observations, issues, new tools and process ideas you can use back in your teams.

3d644406158b4d440111903db1f62622?s=128

ianozsvald

May 29, 2019
Tweet

Transcript

  1. On the Delivery of Data Science Projects @IanOzsvald – ianozsvald.com

    Ian Ozsvald PyDataCambridge 2019-05
  2.  Interim Chief Data Scientist  19+ years experience 

    Quickly build strategic data science plans  Team coaching & public courses Introductions By [ian]@ianozsvald[.com] Ian Ozsvald
  3.  Numerate management ask good data-driven questions  You have

    suitable data  Well defined achievable outcomes are defined  Change is enabled by these projects Data Science shows value when... By [ian]@ianozsvald[.com] Ian Ozsvald
  4.  Unclear true business need  No visibility on the

    data (and its quality)  Blind belief in 100% success  No project specification – lacking shared agreement Common failure points By [ian]@ianozsvald[.com] Ian Ozsvald
  5.  What’s the driver? Is there a fire under it?

     Joonatan’s example from PyDataLT – OCR  Cost/benefit estimate accepting uncertainty Checking business need By [ian]@ianozsvald[.com] Ian Ozsvald
  6.  States a clearly defined problem  Guesses at unknowns

    (and project torpedoes!)  Proposed milestones and Gold Standard/metrics  Clear “definition of done”  Story from 10 years back You need a Project Specification By [ian]@ianozsvald[.com] Ian Ozsvald
  7.  Do you understand your data? – What’s good and

    bad? – What relationships exist?  Build exportable Notebook as html artefact  Read Bertil’s piece on Medium “Data Story” By [ian]@ianozsvald[.com] Ian Ozsvald
  8.  Easy first deliveries – reports  Get to a

    minimal working delivery as soon as possible  Two tracks? R&D and client integration? Continuous delivery to clients By [ian]@ianozsvald[.com] Ian Ozsvald
  9.  Reduce mental load for common decisions – Cookiecutter data-science

    – Watermark – Pandas-profiling / edaviz – Anaconda Standardised Approaches By [ian]@ianozsvald[.com] Ian Ozsvald
  10. Code quality By [ian]@ianozsvald[.com] Ian Ozsvald Attrib: https://devrant.com/rants/347670/code-quality-as-measured-in-wtfs-minute

  11.  Encode assumptions using asserts (example - yesterday’s client issue)

     Refactor to modules  Add unit-tests  Diagnostics e.g. yellowbrick for sklearn Continuously improving code quality By [ian]@ianozsvald[.com] Ian Ozsvald
  12.  Exposure to new processes  Enforced clear communication 

    Balanced consumption & contribution  You’re more visible & valuable Contributing to Open Source gets you By [ian]@ianozsvald[.com] Ian Ozsvald
  13.  High test coverage  Easy roll out & roll

    back  Culture of constructive criticism High performance teams By [ian]@ianozsvald[.com] Ian Ozsvald
  14. “Successfully Delivering Data Science Projects” & “Software Engineering for Data

    Scientists” - early July Resources By [ian]@ianozsvald[.com] Ian Ozsvald
  15.  Derisk early and often  Communicate visually, all the

    time  Strive to continuous improvement  Join my thoughts+jobs list for tips and my training list  Attend PyDataLondon 2019 July 12-14? Summary By [ian]@ianozsvald[.com] Ian Ozsvald