Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2019-05-21 Data Scientists and Software Enginee...

2019-05-21 Data Scientists and Software Engineering

A talk for RStudio Workweek 2019 on why data scientists have different incentives than software engineers.

Alex Gold

May 21, 2019
Tweet

More Decks by Alex Gold

Other Decks in Business

Transcript

  1. Hello RStudio! • Background in Math and Econ ◦ Proud

    econ PhD dropout • Think tank work ◦ Economic policy ◦ Microsimulation modeling • “Data Science” ◦ Ran voter outreach experiments for progressive causes and candidates ◦ Ran small data science consulting team
  2. Data Scientists Stink at: • Version control. • Testing (esp

    test-driven development). They should do these things... But they often don’t...
  3. All of Data Science in 1 Slide* Data call firm_size

    it_sophistication Prediction Pr(call) ~ f(firm_size, it_sophistication) Inference call = ꞵ0 + ꞵ1 * firm_size + ꞵ2 * it_sophistication Clustering Group A Group B .... *Well, mostly. call firm_size it_sophistication call firm_size it_sophistication
  4. Let’s Do Test-Driven Development... Don’t know answer... • If I

    knew, I wouldn’t need data. Unit tests? • Just algorithmic validity... Model metrics? R2 AUC F1 Gini RMSE Pr(call) or ꞵ0, ꞵ1, ꞵ2 or Group A, B QA and testing should be done… • Can’t really specify ahead •
  5. The difference... Software Engineers Data Scientist Deliverable Working Code Code?

    Paper Slide deck Dashboard Model Best Practices Version Control Automated Testing/TDD ????? ?????* Thank you! ➔ Empathy? ➔ It’s not hopeless ➔ Talk to me about “agile data science”