Winner’s Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments Minyong Lee (Airbnb); Milan Shen (Airbnb) (KDD 2018) @_stakaya

• ATotal True Effect

• AExpected Total True Effect • ATotal Estimated Effect

• Expected Total True Effect

upward bias
i∈A A

upward bias
X_i > b_i ¥sigma_i I()

upward bias
I(A) = 1– I(Not(A)) $ &X_i ≦ b_i ¥sigma_i'#I()! %"

upaward bias

upaward bias

upaward bias

upaward bias
• Bias
i=1, …,n

Selection bias with fixed p-values
• p Bias
• Bias

• Bias

• Biastotal true effect

• Zhong and Prentice [25], Efron [7], and Xu, Craiu and Sun [23]A Bias
Gaussian

• Zhong and Prentice [25], Efron [7], and Xu, Craiu and Sun [23]A Bias

Bootstrap
• Total true effect

• n=30
• σ(shape=3, scale=1)
AB1,000)

Figure 2

σ2
Figure 2

Code 26 library("ggplot2") theme_set(theme_grey(base_size=28)) # # Zi|(−1.5 < Zi < 2) where Zi ∼ N(0.2,0.7^2) a <- qnorm(runif(10^5, pnorm(-1.5, mean=0.2, sd=0.7), pnorm(2, mean=0.2, sd=0.7)), mean=0.2, sd=0.7) ggplot(data.frame(value=z), aes(x = value, y = ..density..)) + geom_density(aes(alpha = 0.2), color="#4CAF50", fill="#4CAF50", show.legend=FALSE) + xlim(c(-2.5, 2.5)) + theme_grey(base_size=28) # Code # σ^2 rform the inverse gamma distribution with shape parameter3 and scale param 1 sigma <- sqrt(1/rgamma(10^5, shape=3, scale=1)) ggplot(data.frame(value=sigma), aes(x = value, y = ..density..)) + geom_density(aes(alpha = 0.2), color="#4CAF50", fill="#4CAF50", show.legend=FALSE) + xlim(c(0, 2)) + theme_grey(base_size=28)

v.s.
Figure 3

v.s.
Figure 4

v.s.
Figure 5

Code 30 set.seed(71) size <- 30 a <- qnorm(runif(size, pnorm(-1.5, mean=0.2, sd=0.7), pnorm(2, mean=0.2, sd=0.7)), mean=0.2, sd=0.7) sigma <- sqrt(1/rgamma(size, shape=3, scale=1)) b <- qnorm(0.95, mean=0, sd=1) effect <- list() for(i in seq_len(10^3)){ x <- purrr::map_dbl(seq_len(size), ~ rnorm(1, mean=a[.x], sd=sigma[.x])) binary_win <- as.numeric(x/sigma > b) effect[[length(effect) + 1]] <- data.frame( # S_{A} sa=sum(x*binary_win), # T_{A} ta=sum(x*binary_win) - sum(sigma * dnorm((sigma * b - x)/sigma)), # T_{A, cond} tc=sum(x*binary_win) - sum(sigma * dnorm((sigma * b - x)/sigma)/(1 - pnorm((sigma * b - x)/sigma))*binary_win), # True effect te=sum(a*binary_win) ) } df <- dplyr::bind_rows(effect) # The total estimated effect v.s. The total true effect ggplot(df, aes(x=te, y=sa)) + geom_point() + geom_abline(slope=1, intercept=0) # The expected total true effect (conditional) v.s. The total true effect ggplot(df, aes(x=te, y=tc)) + geom_point() + geom_abline(slope=1, intercept=0) # The expected total true effect v.s. The total true effect ggplot(df, aes(x=te, y=ta)) + geom_point() + geom_abline(slope=1, intercept=0)

• Market Dynamics team
Figure 6 Holdout

Experimentation Reporting Framework (ERF) At Airbnb
• 100 Product Team
• 3,000 Metrics Monitoring
• Winner's Curse Bias
Figure 8

Aibnb
• MetricsNeutral (TotalH
• Holdout

• [7] Bradley Efron. 2011. TweedieâĂŹs formula and selection bias. J. Amer. Statist. Assoc. 106, 496 (Dec. 2011), 1602–1614. • [17] Will Moss. 2014. Experiment reporting framework. (May 2014). Retrieved February 16, 2017 from framework • [18] Jan Overgoor. 2014. Experiments at Airbnb. (May 2014). Retrieved February 16, 2017 from • [23] Lizhen Xu, Radu V Craiu, and Lei Sun. 2011. Bayesian methods to overcome the winner’s curse in genetic studies. The Annals of Applied Statistics (2011) • [25] Hua Zhong and Ross L Prentice. 2008. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9, 4 (Oct. 2008), 621–634. 34