Slide 1

Slide 1 text

On the Delivery of Data Science Projects @IanOzsvald – ianozsvald.com Ian Ozsvald PyDataCambridge 2019-05

Slide 2

Slide 2 text

 Interim Chief Data Scientist  19+ years experience  Quickly build strategic data science plans  Team coaching & public courses Introductions By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 3

Slide 3 text

 Numerate management ask good data-driven questions  You have suitable data  Well defined achievable outcomes are defined  Change is enabled by these projects Data Science shows value when... By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 4

Slide 4 text

 Unclear true business need  No visibility on the data (and its quality)  Blind belief in 100% success  No project specification – lacking shared agreement Common failure points By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 5

Slide 5 text

 What’s the driver? Is there a fire under it?  Joonatan’s example from PyDataLT – OCR  Cost/benefit estimate accepting uncertainty Checking business need By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 6

Slide 6 text

 States a clearly defined problem  Guesses at unknowns (and project torpedoes!)  Proposed milestones and Gold Standard/metrics  Clear “definition of done”  Story from 10 years back You need a Project Specification By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 7

Slide 7 text

 Do you understand your data? – What’s good and bad? – What relationships exist?  Build exportable Notebook as html artefact  Read Bertil’s piece on Medium “Data Story” By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 8

Slide 8 text

 Easy first deliveries – reports  Get to a minimal working delivery as soon as possible  Two tracks? R&D and client integration? Continuous delivery to clients By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 9

Slide 9 text

 Reduce mental load for common decisions – Cookiecutter data-science – Watermark – Pandas-profiling / edaviz – Anaconda Standardised Approaches By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 10

Slide 10 text

Code quality By [ian]@ianozsvald[.com] Ian Ozsvald Attrib: https://devrant.com/rants/347670/code-quality-as-measured-in-wtfs-minute

Slide 11

Slide 11 text

 Encode assumptions using asserts (example - yesterday’s client issue)  Refactor to modules  Add unit-tests  Diagnostics e.g. yellowbrick for sklearn Continuously improving code quality By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 12

Slide 12 text

 Exposure to new processes  Enforced clear communication  Balanced consumption & contribution  You’re more visible & valuable Contributing to Open Source gets you By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 13

Slide 13 text

 High test coverage  Easy roll out & roll back  Culture of constructive criticism High performance teams By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 14

Slide 14 text

“Successfully Delivering Data Science Projects” & “Software Engineering for Data Scientists” - early July Resources By [ian]@ianozsvald[.com] Ian Ozsvald

Slide 15

Slide 15 text

 Derisk early and often  Communicate visually, all the time  Strive to continuous improvement  Join my thoughts+jobs list for tips and my training list  Attend PyDataLondon 2019 July 12-14? Summary By [ian]@ianozsvald[.com] Ian Ozsvald