2019-05-21 Data Scientists and Software Engineering

2019-05-21 Data Scientists and Software Engineering

A talk for RStudio Workweek 2019 on why data scientists have different incentives than software engineers.

2b651c3725763904a603ab0a63a46cc8?s=128

Alex Gold

May 21, 2019
Tweet

Transcript

  1. Alex K Gold, RStudio Workweek 2019 Why Data Scientists Stink

    at Software Engineering
  2. Hello RStudio! • Background in Math and Econ ◦ Proud

    econ PhD dropout • Think tank work ◦ Economic policy ◦ Microsimulation modeling • “Data Science” ◦ Ran voter outreach experiments for progressive causes and candidates ◦ Ran small data science consulting team
  3. Am I talking to you?

  4. Data Scientists Stink at: • Version control. • Testing (esp

    test-driven development). They should do these things... But they often don’t...
  5. Version Control

  6. The difference... Client Report Data Scientist Software Engineer Delivery: Merge

    to master
  7. The Result... Please use version control. Nah. Alex Team

  8. Testing

  9. All of Data Science in 1 Slide* Data call firm_size

    it_sophistication Prediction Pr(call) ~ f(firm_size, it_sophistication) Inference call = ꞵ0 + ꞵ1 * firm_size + ꞵ2 * it_sophistication Clustering Group A Group B .... *Well, mostly. call firm_size it_sophistication call firm_size it_sophistication
  10. Let’s Do Test-Driven Development... Don’t know answer... • If I

    knew, I wouldn’t need data. Unit tests? • Just algorithmic validity... Model metrics? R2 AUC F1 Gini RMSE Pr(call) or ꞵ0, ꞵ1, ꞵ2 or Group A, B QA and testing should be done… • Can’t really specify ahead •
  11. Bottom Line

  12. The difference... Software Engineers Data Scientist Deliverable Working Code Code?

    Paper Slide deck Dashboard Model Best Practices Version Control Automated Testing/TDD ????? ?????* Thank you! ➔ Empathy? ➔ It’s not hopeless ➔ Talk to me about “agile data science”