Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

The Data Errors We Make Sean J Taylor Core Data Science Team Facebook

Slide 3

Slide 3 text

About Me • 5 years at Facebook as a Research Scientist • PhD in Information Systems from New York University • Research Interests: • Field Experiments • Forecasting • Sports and sports fans https://facebook.github.io/prophet/

Slide 4

Slide 4 text

Strategic Decisions Micro-decisions at Scale

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Data Algorithm Human
 Choices Estimate Decision Outcome Truth statistical 
 error practical 
 error Optimal Decision Optimal
 Outcome

Slide 7

Slide 7 text

Simplest Error Model

Slide 8

Slide 8 text

H0: You are not pregnant. H1: You are pregnant.

Slide 9

Slide 9 text

H0 is True Product is Bad H1 is True Product is Good Accept Null Hypothesis (Don’t ship product) Right decision Type II Error (wrong decision) Reject Null Hypothesis (Ship Product) Type I Error (wrong decision) Right decision

Slide 10

Slide 10 text

Receiver Operating Characteristic (ROC) Curve tells us Type I and II error rates Type I error rate (1 - Type II error rate)

Slide 11

Slide 11 text

Outline 1. Refinements to the Type I/II error model 2. A simple causal model of how we make errors 3. What we can effectively do about errors

Slide 12

Slide 12 text

Refinements

Slide 13

Slide 13 text

Refinement 1:
 Assign Costs to Errors H0 is True Product is Bad H1 is True Product is Good Accept Null Hypothesis (Don’t ship product) Right decision Type II Error (wrong decision) Reject Null Hypothesis (Ship Product) Type I Error (wrong decision) Right decision

Slide 14

Slide 14 text

Refinement 1:
 Assign Costs to Errors H0 is True Product is Bad H1 is True Product is Good Accept Null Hypothesis (Don’t ship product) 0 -100 Reject Null Hypothesis (Ship Product) -200 +100

Slide 15

Slide 15 text

Example: 
 Expected value of a product launch P(Type I) is 1% and P(Type II) is 20% P(good) * (100 * .80 + -100 * .2) + (1 - P(good)) * (-200 * .01 + 0 * .99) = (.5 * 60) + (.5 * -2) = 30 - 1 = 29

Slide 16

Slide 16 text

Allowing more Type I errors lowers Type II rate. Optimal choice depends on payoffs and P(H1).

Slide 17

Slide 17 text

P(Type I) is 5% and P(Type II) is 7% P(good) * (100 * .93 + -100 *.07) + (1 - P(good)) * (-200 * .05 + 0 * .95) = (.5 * 86) + (.5 * -10) = 43 - 5 = 38 > 29 Example 2: 
 Expected value of a product launch

Slide 18

Slide 18 text

Refinement 2: Opportunity Cost Key Idea: If we devote resources to minimizing Type I and II errors for one problem, we will have fewer resources for other problems. • Few organizations makes a single decision, we usually make many of them. • Acquiring more data, investing more time into problems has diminishing marginal returns.

Slide 19

Slide 19 text

Examples of Constraints • Sample size for online experiments • Gathering more data • Analyst time

Slide 20

Slide 20 text

Refinement 3: Mosteller’s Type III Errors 
 Type III error: “correctly rejecting the null hypothesis for the wrong reason” -- Frederick Mosteller More clearly: The process you used worked this time, but is unlikely to continue working in the future.

Slide 21

Slide 21 text

Good Process vs. Good Outcome Good Outcome Bad Outcome Good Process Deserved Success Bad Break Bad Process Dumb Luck Poetic Justice

Slide 22

Slide 22 text

Refinement 4: Kimball’s Type III Errors 
 Type III error: “the error committed by giving the right answer to the wrong problem” -- Allyn W. Kimball

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Why we make errors

Slide 26

Slide 26 text

Data Algorithm Human
 Choices Estimate

Slide 27

Slide 27 text

Cause 1: Data • Inadequate data • Non-representative data • Measuring the wrong thing

Slide 28

Slide 28 text

made data designed to be adequate found data adequate if we are fortunate

Slide 29

Slide 29 text

Non-representative data

Slide 30

Slide 30 text

2014 World Cup First Facebook Check-ins in Brazil from non-Brazilian users

Slide 31

Slide 31 text

Bias? 2014 World Cup Check-ins by Country

Slide 32

Slide 32 text

Measuring the wrong thing

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

Common Pattern • High volume of of cheap, easy to measure “surrogate” 
 (e.g. steps, clicks) • Surrogate is correlated with true measurement of interest (e.g. overall health, purchase intention) • key question: sign and magnitude of “interpretation bias”

Slide 36

Slide 36 text

Cause 2: Algorithms • The model/procedure we choose primarily concerns what side of the bias-variance tradeoff we'd like to be on. • Common mistakes are: • Using a model that’s too complex for the data. • Focusing too much on algorithms instead of gathering the right data or correctness.

Slide 37

Slide 37 text

Optimizing models Reducing bias • Choose a more flexible model. Reducing variance • Choosing a less flexible model. • Get more data.

Slide 38

Slide 38 text

Tree Induction vs. Logistic Regression: A Learning-Curve Analysis
 Perlich et al. (2003) • logistic regression is better for smaller training sets and tree induction for larger data sets • logistic regression is usually better when the signal-to- noise ratio is lower

Slide 39

Slide 39 text

Cause 3: Human choices Many analysts, one dataset: Making transparent how variations in analytical choices affect results
 (Silberzahn et al. 2017) • 29 teams involving 61 analysts used the same dataset to address the same research question • Are soccer ⚽ referees are more likely to give red cards to dark skin toned players than light skin toned players?

Slide 40

Slide 40 text

• effect sizes ranged from 0.89 to 2.93 in odds ratio units • 20 teams (69%) found a statistically significant positive effect • 9 teams (31%) observed a nonsignificant relationship

Slide 41

Slide 41 text

Overconfidence

Slide 42

Slide 42 text

Incentives

Slide 43

Slide 43 text

Ways Forward • prevent errors • opinionated analysis development • test driven data analysis • be honest about uncertainty • estimate uncertainty using the bootstrap

Slide 44

Slide 44 text

Opinionated Analysis Development
 (by Hilary Parker)

Slide 45

Slide 45 text

Test-Driven Data Analysis

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Estimating Uncertainty

Slide 49

Slide 49 text

No algorithm in Scikit Learn 
 will estimate uncertainty.

Slide 50

Slide 50 text

The Bootstrap R1 All Your Data R2 … R500 Generate random sub-samples s1 s2 s500 Compute statistics or estimate model parameters … } 0.0 2.5 5.0 7.5 -2 -1 0 1 2 Statistic Count Get a distribution over statistic of interest (usually the prediction) - take mean - CIs == 95% quantiles - SEs == standard deviation

Slide 51

Slide 51 text

Summary Think about errors! • What kind of errors are we making? • Where did the come from? Prevent errors! • Use a reasonable and reproducible process. • Test your analysis as you test your code. Estimate uncertainty! • Models that estimate uncertainty are more useful than those that don’t. • They facilitate better learning and experimentation.