2016 - Or Weizman - A/B Testing: Harder than just a color change

A/B Testing Harder than just a color change Or Weizman
[email protected]

Yelp’s Mission Connecting people with great local businesses.

Yelp Stats As of Q2 2016 92M 32 72% 108M

A/B Experiment? • What? • When? • Why? ? DATA
DECISIONS

How? • Generate Hypothesis • Gather Necessary Data • Implementation
and Testing • Roll out Experiment • Analyze Results

Case Study 1: Greener Pastures

Generate Hypothesis Analyze Roll Out Implement & Test Gather Data
Hypothesis

Gather Necessary Data Analyze Roll Out Implement & Test Gather
Data Hypothesis

What kind of data? • Click Through Rate • Weekly
Traffic • What else?

• Key Metric • Secondary Metrics User lands on Biz
page User clicks on Start Order User adds items User completes order User lands on Biz page User clicks on Start Order

Math Time! • Minimum Detectable Effect of 5% • Baseline
conversion • Statistical power (1 - β) of 95% • Significance level α of 5% 307,967 users per variation

Implementation and Testing Analyze Roll Out Implement & Test Gather
Data Hypothesis

50% variation 50% status quo

Kwargs are your friends not food self.button_color_cohort = kwargs.get('button_color_cohort', 'status_quo')
if self.button_color_cohort == 'status_quo': button_color = 'orange' else: button_color = 'green'

Unit tests aren’t enough! • Run through flows • Visual
Inspection • Integration Tests

Roll out Experiment Analyze Roll Out Implement & Test Gather
Data Hypothesis

Steady • If possible roll out in this order ◦
Internally for testing ◦ To a small percentage traffic ◦ To predetermined traffic percentage ▪ 50/50 in our case

Analyze Results Analyze Roll Out Implement & Test Gather Data
Hypothesis

Results • 5.26% relative increase • Great Success!

Good News Everyone!

Case Study 2: An abundance of experiments

Generate Hypothesis Analyze Roll Out Implement & Test Gather Data
Hypothesis

Gather Necessary Data Analyze Roll Out Implement & Test Gather
Data Hypothesis

More than 1 Experimental Variation • Correct for multiple •
α = 5%/(# of cohorts - 1)

Implementation and Testing Analyze Roll Out Implement & Test Gather
Data Hypothesis

Let's Build It!

Let's Build It! Left Layout Right Layout Orange Green Red
Blue Dark Orange

Let's Build It! Alternatively run sequentially but this is slow....

Sharing the Pool Experiment 1 off on

Sharing the Pool Experiment 2 on off

Sharing the Pool Experiment 1 and Experiment 2 are active

Sharing the Pool • Swimlanes is a way to pre-partition
• Experiments assigned to lanes • Allows for control

Sharing the Pool Experiment 1 off on unavailable unavailable

What’s better?

What’s better? Experiment 1 Experiment 2 Experiment 3 Experiment 4

Testing Gets harder • Experiments can now have another condition
• Previous if statement breaks • Complexity -> more cases

if self.button_color_cohort == 'status_quo': button_color = 'orange' else: button_color =
'green' Testing Gets harder if self.button_color_cohort == ‘enabled’: button_color = 'green' else: button_color = 'orange'

Roll out Experiment Analyze Roll Out Implement & Test Gather
Data Hypothesis

Autobots Roll Out

Analyze Results Analyze Roll Out Implement & Test Gather Data
Hypothesis

• Green the Winner • What’s next?

• Left is better! • 9.3% increase • Secondary metrics
heavily impacted

Summary • Understand How to do A/B Experiments • Harder
but Doable! ◦ Complexity can be managed ◦ Swimlanes are our solution • After just 3 experiments saw 15% increase in sales

@YelpEngineering fb.com/YelpEngineers engineeringblog.yelp.com github.com/yelp

2016 - Or Weizman - A/B Testing: Harder than j...

2016 - Or Weizman - A/B Testing: Harder than just a color change

More Decks by PyBay

Other Decks in Programming

Featured

Transcript