Slide 1

Slide 1 text

On the Delivery of Data Science Projects @IanOzsvald – ianozsvald.com Ian Ozsvald Business, Analytics and Data Science meetup 2019-02

Slide 2

Slide 2 text

 Interim Chief Data Scientist  19+ years experience  Quickly build strategic data science plans  Team coaching & public courses Introductions By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 3

Slide 3 text

 Numerate management ask good data-driven questions  You have suitable data  Well defined achievable outcomes are defined  Change is enabled by these projects Data Science shows value when... By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 4

Slide 4 text

 “Make us more [money/…]” - give me magic!  Desire over need – vanity projects!  Lack of technical leadership – poor/missing specs  Bad data – lies, mistakes and confusion  Lack of client buy-in – no burning need Common delivery problems By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 5

Slide 5 text

 Audience – your observations? What problems have you seen? By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 6

Slide 6 text

 States a clearly defined problem  Guesses at unknowns (and project torpedoes!)  Proposed milestones and Gold Standard/metrics  Clear “definition of done”  Story from 10 years back You need a Project Specification By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 7

Slide 7 text

 Do you understand your data? – What’s good and bad? – What relationships exist?  Build exportable Notebook as html artefact  Read Bertil’s piece on Medium “Data Story” By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 8

Slide 8 text

 Reduce mental load for common decisions – Cookiecutter data-science – Watermark – Pandas-profiling – Anaconda Standardised Approaches By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 9

Slide 9 text

Code quality By [ian]@ianozsvald[.com] Ian Ozsvald Attrib: https://devrant.com/rants/347670/code-quality-as-measured-in-wtfs-minute

Slide 10

Slide 10 text

 Encode assumptions using asserts  Refactor to modules  Add unit-tests  Visual reports with analyst interpretations  Diagnostics e.g. yellowbrick for sklearn Continuously improving code quality By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 11

Slide 11 text

 Code review (with a check-list & PEP8)  nbdime for diffs  “Data Defences” - regular critiques by colleagues on your project Continuously improving project quality By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 12

Slide 12 text

 Exposure to new processes  Enforced clear communication  Balanced consumption & contribution  You’re more visible & valuable Contributing to Open Source gets you By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 13

Slide 13 text

 Easy first deliveries – reports  Get to a minimal working delivery as soon as possible  Consider papermill for deployable Notebooks Continuous delivery to clients By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 14

Slide 14 text

My “Successfully Delivering Data Science Projects” course – sold out – join my training list via ianozsvald.com Resources By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 15

Slide 15 text

 Derisk early and often  Communicate visually, all the time  Honesty throughout your work  Strive to continuous improvement  Consider speaking at PyDataLondon 2019 July 12-14 Summary By [ian]@ianozsvald[.com] Ian Ozsvald