Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Data Errors we Make by Sean Taylor at Big Data Spain 2017

The Data Errors we Make by Sean Taylor at Big Data Spain 2017

Where statistical errors come from, how they cause us to make bad decisions, and what to do about it.

https://www.bigdataspain.org/2017/talk/the-data-errors-we-make

Big Data Spain 2017
16th - 17th November Kinépolis Madrid

Cb6e6da05b5b943d2691ceefa3381cad?s=128

Big Data Spain

November 22, 2017
Tweet

Transcript

  1. None
  2. The Data Errors We Make Sean J Taylor Core Data

    Science Team Facebook
  3. About Me • 5 years at Facebook as a Research

    Scientist • PhD in Information Systems from New York University • Research Interests: • Field Experiments • Forecasting • Sports and sports fans https://facebook.github.io/prophet/
  4. Strategic Decisions Micro-decisions at Scale

  5. None
  6. Data Algorithm Human
 Choices Estimate Decision Outcome Truth statistical 


    error practical 
 error Optimal Decision Optimal
 Outcome
  7. Simplest Error Model

  8. H0: You are not pregnant. H1: You are pregnant.

  9. H0 is True Product is Bad H1 is True Product

    is Good Accept Null Hypothesis (Don’t ship product) Right decision Type II Error (wrong decision) Reject Null Hypothesis (Ship Product) Type I Error (wrong decision) Right decision
  10. Receiver Operating Characteristic (ROC) Curve tells us Type I and

    II error rates Type I error rate (1 - Type II error rate)
  11. Outline 1. Refinements to the Type I/II error model 2.

    A simple causal model of how we make errors 3. What we can effectively do about errors
  12. Refinements

  13. Refinement 1:
 Assign Costs to Errors H0 is True Product

    is Bad H1 is True Product is Good Accept Null Hypothesis (Don’t ship product) Right decision Type II Error (wrong decision) Reject Null Hypothesis (Ship Product) Type I Error (wrong decision) Right decision
  14. Refinement 1:
 Assign Costs to Errors H0 is True Product

    is Bad H1 is True Product is Good Accept Null Hypothesis (Don’t ship product) 0 -100 Reject Null Hypothesis (Ship Product) -200 +100
  15. Example: 
 Expected value of a product launch P(Type I)

    is 1% and P(Type II) is 20% P(good) * (100 * .80 + -100 * .2) + (1 - P(good)) * (-200 * .01 + 0 * .99) = (.5 * 60) + (.5 * -2) = 30 - 1 = 29
  16. Allowing more Type I errors lowers Type II rate. Optimal

    choice depends on payoffs and P(H1).
  17. P(Type I) is 5% and P(Type II) is 7% P(good)

    * (100 * .93 + -100 *.07) + (1 - P(good)) * (-200 * .05 + 0 * .95) = (.5 * 86) + (.5 * -10) = 43 - 5 = 38 > 29 Example 2: 
 Expected value of a product launch
  18. Refinement 2: Opportunity Cost Key Idea: If we devote resources

    to minimizing Type I and II errors for one problem, we will have fewer resources for other problems. • Few organizations makes a single decision, we usually make many of them. • Acquiring more data, investing more time into problems has diminishing marginal returns.
  19. Examples of Constraints • Sample size for online experiments •

    Gathering more data • Analyst time
  20. Refinement 3: Mosteller’s Type III Errors 
 Type III error:

    “correctly rejecting the null hypothesis for the wrong reason” -- Frederick Mosteller More clearly: The process you used worked this time, but is unlikely to continue working in the future.
  21. Good Process vs. Good Outcome Good Outcome Bad Outcome Good

    Process Deserved Success Bad Break Bad Process Dumb Luck Poetic Justice
  22. Refinement 4: Kimball’s Type III Errors 
 Type III error:

    “the error committed by giving the right answer to the wrong problem” -- Allyn W. Kimball
  23. None
  24. None
  25. Why we make errors

  26. Data Algorithm Human
 Choices Estimate

  27. Cause 1: Data • Inadequate data • Non-representative data •

    Measuring the wrong thing
  28. made data designed to be adequate found data adequate if

    we are fortunate
  29. Non-representative data

  30. 2014 World Cup First Facebook Check-ins in Brazil from non-Brazilian

    users
  31. Bias? 2014 World Cup Check-ins by Country

  32. Measuring the wrong thing

  33. None
  34. None
  35. Common Pattern • High volume of of cheap, easy to

    measure “surrogate” 
 (e.g. steps, clicks) • Surrogate is correlated with true measurement of interest (e.g. overall health, purchase intention) • key question: sign and magnitude of “interpretation bias”
  36. Cause 2: Algorithms • The model/procedure we choose primarily concerns

    what side of the bias-variance tradeoff we'd like to be on. • Common mistakes are: • Using a model that’s too complex for the data. • Focusing too much on algorithms instead of gathering the right data or correctness.
  37. Optimizing models Reducing bias • Choose a more flexible model.

    Reducing variance • Choosing a less flexible model. • Get more data.
  38. Tree Induction vs. Logistic Regression: A Learning-Curve Analysis
 Perlich et

    al. (2003) • logistic regression is better for smaller training sets and tree induction for larger data sets • logistic regression is usually better when the signal-to- noise ratio is lower
  39. Cause 3: Human choices Many analysts, one dataset: Making transparent

    how variations in analytical choices affect results
 (Silberzahn et al. 2017) • 29 teams involving 61 analysts used the same dataset to address the same research question • Are soccer ⚽ referees are more likely to give red cards to dark skin toned players than light skin toned players?
  40. • effect sizes ranged from 0.89 to 2.93 in odds

    ratio units • 20 teams (69%) found a statistically significant positive effect • 9 teams (31%) observed a nonsignificant relationship
  41. Overconfidence

  42. Incentives

  43. Ways Forward • prevent errors • opinionated analysis development •

    test driven data analysis • be honest about uncertainty • estimate uncertainty using the bootstrap
  44. Opinionated Analysis Development
 (by Hilary Parker)

  45. Test-Driven Data Analysis

  46. None
  47. None
  48. Estimating Uncertainty

  49. No algorithm in Scikit Learn 
 will estimate uncertainty.

  50. The Bootstrap R1 All Your Data R2 … R500 Generate

    random sub-samples s1 s2 s500 Compute statistics or estimate model parameters … } 0.0 2.5 5.0 7.5 -2 -1 0 1 2 Statistic Count Get a distribution over statistic of interest (usually the prediction) - take mean - CIs == 95% quantiles - SEs == standard deviation
  51. Summary Think about errors! • What kind of errors are

    we making? • Where did the come from? Prevent errors! • Use a reasonable and reproducible process. • Test your analysis as you test your code. Estimate uncertainty! • Models that estimate uncertainty are more useful than those that don’t. • They facilitate better learning and experimentation.