2019-05-21 Data Scientists and Software Engineering

2019-05-21 Data Scientists and Software Engineering

A talk for RStudio Workweek 2019 on why data scientists have different incentives than software engineers.


Alex Gold

May 21, 2019


  1. 2.

    Hello RStudio! • Background in Math and Econ ◦ Proud

    econ PhD dropout • Think tank work ◦ Economic policy ◦ Microsimulation modeling • “Data Science” ◦ Ran voter outreach experiments for progressive causes and candidates ◦ Ran small data science consulting team
  2. 4.

    Data Scientists Stink at: • Version control. • Testing (esp

    test-driven development). They should do these things... But they often don’t...
  3. 8.
  4. 9.

    All of Data Science in 1 Slide* Data call firm_size

    it_sophistication Prediction Pr(call) ~ f(firm_size, it_sophistication) Inference call = ꞵ0 + ꞵ1 * firm_size + ꞵ2 * it_sophistication Clustering Group A Group B .... *Well, mostly. call firm_size it_sophistication call firm_size it_sophistication
  5. 10.

    Let’s Do Test-Driven Development... Don’t know answer... • If I

    knew, I wouldn’t need data. Unit tests? • Just algorithmic validity... Model metrics? R2 AUC F1 Gini RMSE Pr(call) or ꞵ0, ꞵ1, ꞵ2 or Group A, B QA and testing should be done… • Can’t really specify ahead •
  6. 12.

    The difference... Software Engineers Data Scientist Deliverable Working Code Code?

    Paper Slide deck Dashboard Model Best Practices Version Control Automated Testing/TDD ????? ?????* Thank you! ➔ Empathy? ➔ It’s not hopeless ➔ Talk to me about “agile data science”