Slide 1

Slide 1 text

Winner’s Curse: Bias Estimation for Total Effects of Features in Online Controlled Experiments Minyong Lee (Airbnb); Milan Shen (Airbnb) (KDD 2018)   @_stakaya 

Slide 2

Slide 2 text

 • !2AB)Music.1 -#43+( • "%'0AB)Music .15#43+( • &*  !2”$,43/”    2

Slide 3

Slide 3 text

• !2AB)Music.1 -#43+( • "%'0AB)Music .15#43+( • &*  !2”$,43/”     3

Slide 4

Slide 4 text

 • >,AB*)%5;4Bias <6%5?7 $  – Winner‘s Curse Bias=@ • 8+&(2 –#9'/0*:- .4  3Business Impact1  • AirbnbAB" ! 4

Slide 5

Slide 5 text

  •  3<AB$"&     )%* –@A/B$"&; #' → Gaussian 28 79B0.1 ;6!; / 2,NG –CA4  >5?: +- Wiki() =… 5

Slide 6

Slide 6 text

• n7"!#0  • i8("!#;*Metrics/& –5 /&),90  –… 62OK • <>631- –5'% 2 • +4&= : –31$ .2 +4&=    6

Slide 7

Slide 7 text

  • #& – b,# %( .+ – -*" !(  • ATotal True Effect – *" – A$)' $)' 7

Slide 8

Slide 8 text

  • AExpected Total True Effect • ATotal Estimated Effect – Total   –     8

Slide 9

Slide 9 text

  •   Expected Total True Effect 9

Slide 10

Slide 10 text

upward bias   10   i∈A   A      

Slide 11

Slide 11 text

upward bias   11 X_i > b_i ¥sigma_i I()     

Slide 12

Slide 12 text

upward bias   12 I(A) = 1– I(Not(A)) $  & X_i ≦ b_i ¥sigma_i'#I() !   %"

Slide 13

Slide 13 text

upaward bias    13   

Slide 14

Slide 14 text

upaward bias   14  "  !  

Slide 15

Slide 15 text

upaward bias    15 

Slide 16

Slide 16 text

upaward bias   • # Bias • " •   –*$/(0%- &)  –!(  +'  •  ., i=1, …,n  16

Slide 17

Slide 17 text

Selection bias with fixed p-values • p  Bias  •    Bias 17

Slide 18

Slide 18 text

 • Bias  –     –     18

Slide 19

Slide 19 text

 • Biastotal true effect 19

Slide 20

Slide 20 text

"%  • Zhong and Prentice [25], Efron [7], and Xu, Craiu and Sun [23]A Bias  •   20    Gaussian '#&  !$  

Slide 21

Slide 21 text

 • Zhong and Prentice [25], Efron [7], and Xu, Craiu and Sun [23]A Bias 21

Slide 22

Slide 22 text

Bootstrap • Total true effect"   !  •   # 22

Slide 23

Slide 23 text

 • n=30 • • σ24$&5(shape=3, scale=1) .+ • ,3* 0  • … /6'(2%AB1,000) ! #"$-1 23

Slide 24

Slide 24 text

a 24 Figure 2 

Slide 25

Slide 25 text

σ2 25 Figure 2 

Slide 26

Slide 26 text

Code 26 library("ggplot2") theme_set(theme_grey(base_size=28)) #   # Zi|(−1.5 < Zi < 2) where Zi ∼ N(0.2,0.7^2) a <- qnorm(runif(10^5, pnorm(-1.5, mean=0.2, sd=0.7), pnorm(2, mean=0.2, sd=0.7)), mean=0.2, sd=0.7) ggplot(data.frame(value=z), aes(x = value, y = ..density..)) + geom_density(aes(alpha = 0.2), color="#4CAF50", fill="#4CAF50", show.legend=FALSE) + xlim(c(-2.5, 2.5)) + theme_grey(base_size=28) #    Code   # σ^2 rform the inverse gamma distribution with shape parameter3 and scale param 1 sigma <- sqrt(1/rgamma(10^5, shape=3, scale=1)) ggplot(data.frame(value=sigma), aes(x = value, y = ..density..)) + geom_density(aes(alpha = 0.2), color="#4CAF50", fill="#4CAF50", show.legend=FALSE) + xlim(c(0, 2)) + theme_grey(base_size=28)

Slide 27

Slide 27 text

v.s. 27 Figure 3 

Slide 28

Slide 28 text

v.s. 28 Figure 4 

Slide 29

Slide 29 text

v.s. 29 Figure 5 

Slide 30

Slide 30 text

Code 30 set.seed(71) size <- 30 a <- qnorm(runif(size, pnorm(-1.5, mean=0.2, sd=0.7), pnorm(2, mean=0.2, sd=0.7)), mean=0.2, sd=0.7) sigma <- sqrt(1/rgamma(size, shape=3, scale=1)) b <- qnorm(0.95, mean=0, sd=1) effect <- list() for(i in seq_len(10^3)){ x <- purrr::map_dbl(seq_len(size), ~ rnorm(1, mean=a[.x], sd=sigma[.x])) binary_win <- as.numeric(x/sigma > b) effect[[length(effect) + 1]] <- data.frame( # S_{A} sa=sum(x*binary_win), # T_{A} ta=sum(x*binary_win) - sum(sigma * dnorm((sigma * b - x)/sigma)), # T_{A, cond} tc=sum(x*binary_win) - sum(sigma * dnorm((sigma * b - x)/sigma)/(1 - pnorm((sigma * b - x)/sigma))*binary_win), # True effect te=sum(a*binary_win) ) } df <- dplyr::bind_rows(effect) # The total estimated effect v.s. The total true effect ggplot(df, aes(x=te, y=sa)) + geom_point() + geom_abline(slope=1, intercept=0) # The expected total true effect (conditional) v.s. The total true effect ggplot(df, aes(x=te, y=tc)) + geom_point() + geom_abline(slope=1, intercept=0) # The expected total true effect v.s. The total true effect ggplot(df, aes(x=te, y=ta)) + geom_point() + geom_abline(slope=1, intercept=0)

Slide 31

Slide 31 text

  • Market Dynamics team  31 Figure 6 Holdout  

Slide 32

Slide 32 text

Experimentation Reporting Framework (ERF) At Airbnb • 100  Product Team   • 3,000 Metrics Monitoring • Winner‘s Curse Bias  $!0 32 Figure 8 # [17], [18]"

Slide 33

Slide 33 text

Aibnb • 53=LDαθG0 >  –'%( 7?5,$!*Dθ –8S7?=LDα • 9JRF2  '%(B /;  –MetricsNeutral *&" .)COA/B'% (TotalH@P'%(B/1 • Holdout#-+E< –MQ:53N4H@6AKI< 33

Slide 34

Slide 34 text

 • [7] Bradley Efron. 2011. TweedieâĂŹs formula and selection bias. J. Amer. Statist. Assoc. 106, 496 (Dec. 2011), 1602–1614. • [17] Will Moss. 2014. Experiment reporting framework. (May 2014). Retrieved February 16, 2017 from http://nerds.airbnb.com/experiment-reporting- framework • [18] Jan Overgoor. 2014. Experiments at Airbnb. (May 2014). Retrieved February 16, 2017 from http://nerds.airbnb.com/experiments-at-airbnb • [23] Lizhen Xu, Radu V Craiu, and Lei Sun. 2011. Bayesian methods to overcome the winner’s curse in genetic studies. The Annals of Applied Statistics (2011) • [25] Hua Zhong and Ross L Prentice. 2008. Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9, 4 (Oct. 2008), 621–634. 34