Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Starter Data Science Process for Software Engineers

ianozsvald
June 15, 2019

A Starter Data Science Process for Software Engineers

From my talk at PyLondinium 2019 (https://ianozsvald.com/2019/06/15/a-starter-data-science-process-for-software-engineers-talk-at-pylondinium-2019/), we look at what's required for a valuable data science project, how to approach it (make a spec!), then step into a live demo using Jupyter, Altair & matplotlib for visualisations, a Widget driving predictions for interactivity and Voila to serve it up.

ianozsvald

June 15, 2019
Tweet

More Decks by ianozsvald

Other Decks in Science

Transcript

  1. A starter data science process for
    software engineers
    @IanOzsvald – ianozsvald.com
    Ian Ozsvald
    PyLondinium 2019

    View full-size slide


  2. Interim Chief Data Scientist

    19+ years experience

    Quickly build strategic data science plans

    Team coaching & public courses
    Introductions
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View full-size slide


  3. Numerate management ask good data-driven questions

    You have suitable data

    Well defined achievable outcomes are defined

    Change is enabled by these projects
    Data Science shows value when...
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View full-size slide


  4. What’s the driver? Is there a fire under it?

    Joonatan’s example from PyDataLT – OCR

    Cost/benefit estimate accepting uncertainty

    Automatable
    Checking business need
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View full-size slide


  5. States a clearly defined problem

    Guesses at unknowns (and project torpedoes!)

    Proposed milestones and Gold Standard/metrics

    Clear “definition of done”

    Story from 10 years back
    You need a Project Specification
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View full-size slide


  6. Want to automate “MPG estimates” to help engineers

    It only needs to be good enough for ranking, to assist the
    team in prioritising their investigations

    We need to gain the team’s trust in stages

    Pandas, sklearn, Yellowbrick, custom estimator
    A pretend example & live demo
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View full-size slide

  7. “Software Engineering for Data
    Scientists” - early July
    Resources
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View full-size slide


  8. Your organisers are volunteers

    Thank all volunteers & speakers please

    Get a free signed book around 3.30pm
    Thank your organisers
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View full-size slide


  9. Automate parts of a high value problem

    Deliver value incrementally

    Communicate early & often

    Join my thoughts+jobs list for tips and my training list

    Lots of past talks on ianozsvald.com
    Summary
    By [ian]@ianozsvald[.com] Ian Ozsvald

    View full-size slide