Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Defly Delivering Data Science Projects

3d644406158b4d440111903db1f62622?s=47 ianozsvald
November 11, 2018

Defly Delivering Data Science Projects

Battle tested observations on ways to improve the likelihood that your data science project goes smoothly and gets delivered correctly. Given at the inaugural PyDataPrague.

3d644406158b4d440111903db1f62622?s=128

ianozsvald

November 11, 2018
Tweet

More Decks by ianozsvald

Other Decks in Technology

Transcript

  1. Deftly Delivering Data Science Projects PyDataPrague 2018-10 Ian Ozsvald @IanOzsvald

    ModelInsight.io
  2. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Introductions • I’m an engineering data

    scientist • 15+ years experience • Team coaching • Strategic planning • Training
  3. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Problems delivering DS projects • What

    are your experiences?
  4. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Problems delivering DS projects • “Make

    us more [money|signups|...]” - desire for magic • Desire over actual need – vanity projects • Lack of technical leadership – poor specs • Bad data – lies, mistakes, confusion • Lack of client buy-in
  5. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Good DS projects • Numerate management

    asking good data driven questions • Suitable data • Well defined outcomes that are agreed to be achievable
  6. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Learning and applying at...

  7. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Project Specification • You need a

    clearly defined problem • Where are the unknowns? • Known unknowns • What might kill the project? • Propose milestones • Where’s your Gold Standard data set? • What’s your “definition of done” • Minimal results and great results • Appropriate metrics to communicate results
  8. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 “Data story” • Do you understand

    your data? • Explain your data – what does it say? • What’s good and what’s bad? • What are the relationships? • Where is the signal in the data? • Export your Notebook as html artefact • Data Story proposed by Bertil (Medium)
  9. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Standardised approaches • Reduce the mental

    load for common decisions • Cookiecutter (folders) • pandas-profiling • watermark • Anaconda
  10. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Improving code quality • Encode assumptions

    with asserts • Refactor to modules • Add unit-tests • Visual reports with analyst interpretations • Diagnostics e.g. yellowbrick for sklearn
  11. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Improving project quality • Code reviews

    (with a check-list, PEP8) • nbdime for diffs • “Data Defences” - regular critiques by colleagues on your project
  12. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Continuous delivery to client • Early

    deliveries – reports • Get to a minimal working delivery as soon as possible (UI? App? Reports?) • Consider papermill for deployable Notebooks
  13. Ian.Ozsvald@ModelInsight.io @IanOzsvald[.com] PyDataPrague 2018-10 Summary • Honesty throughout your work

    • Strive to keep improving your technique • Keep communicating your results