Slide 1

Slide 1 text

Jonathan Maltz [email protected]/@maltzj A/B Testing From The Ground Up

Slide 2

Slide 2 text

Yelp’s Mission Connecting people with great local businesses.

Slide 3

Slide 3 text

Hi! I’m Maltz! ● Android at Yelp! ● Now: Building experimentation systems ● Previously: Full-Stack and data stuff at Eat24.

Slide 4

Slide 4 text

What Will We Talk About Today?

Slide 5

Slide 5 text

A/B Testing

Slide 6

Slide 6 text

● What are A/B tests? ● Why run A/B tests? ● What infrastructure is needed to A/B test? ● What can you buy vs. build? ● What are pitfalls to watch out for? Specifically

Slide 7

Slide 7 text

But First...

Slide 8

Slide 8 text

Why A/B Test?

Slide 9

Slide 9 text

Click me! Time Clicks

Slide 10

Slide 10 text

Click me! Time Clicks

Slide 11

Slide 11 text

We should keep the button Red!

Slide 12

Slide 12 text

But wait...

Slide 13

Slide 13 text

Click me! Time Clicks

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Click me! Click me! 50% of Users Vs. 50% of Users

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Inform The Hippo!

Slide 18

Slide 18 text

Inform The Hippo! (Highest Paid Person’s Opinion)

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Bucketing System Your App Metrics Collection Pipeline (on device) Metrics Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service

Slide 21

Slide 21 text

Bucketing System Your App Metrics Collection Pipeline (on device) Metrics Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service

Slide 22

Slide 22 text

Bucketing System

Slide 23

Slide 23 text

At a high level Identifier Bucket Randomization function (probably a hash)

Slide 24

Slide 24 text

Green: 50% 0 100 Red: 50% get_bucket(id + salt) % 100 get_bucket(“my_exp163”) % 100 get_bucket(...419) % 100 get_bucket(...802) % 100

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Common Pitfalls Green: 100% 0 1 Green: 80% 0 1 Red: 20% Green: 50% 0 1 Red: 50% 0 1 Red: 100% Time get_bucket(id + salt) get_bucket(“my_exp163”) get_bucket(...419) get_bucket(...802)

Slide 27

Slide 27 text

Common Pitfalls cont... Green: 100% 0 1 Time get_bucket(id + salt) get_bucket(...261) get_bucket(...591) get_bucket(...812) Green: 90% 0 1 Yellow: 5% Red: 5% Green: 80% 0 1 Yellow: 10% Red: 10%

Slide 28

Slide 28 text

Common Pitfalls cont... Green: 100% 0 1 Time get_bucket(id + salt) get_bucket(...311) get_bucket(...451) get_bucket(...723) Green: 50% 0 1 Red-Buffer (no-op): 20% Red: 5% Green: 50% 0 1 E: 5% Red: 25% Yellow: 25% Yellow-Buffer (no-op): 20% Yellow 5%

Slide 29

Slide 29 text

Don’t Forget The Salt!

Slide 30

Slide 30 text

Common Pitfalls cont... Status Quo: 50% get_bucket(id) Cohort A: 50% Experiment 1 Status Quo: 50% Cohort A: 50% Experiment 2 Cohort B 50% Status Quo: 50% Experiment 3 Cohort A 50%

Slide 31

Slide 31 text

getBucket( id + exp_name)

Slide 32

Slide 32 text

How To Implement It?

Slide 33

Slide 33 text

Option 1: Config File + Buckets Endpoint

Slide 34

Slide 34 text

Option 1: Config File + Buckets Endpoint YAML File Backend Mobile App

Slide 35

Slide 35 text

Option 1: Config File + Buckets Endpoint YAML File Backend Mobile App Request Cohorts w/ id

Slide 36

Slide 36 text

Option 1: Config File + Buckets Endpoint YAML File Backend Mobile App Request Cohorts w/ id Parse + Retrieve Cohorts

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

Pros ● Easy to implement ● Easy to understand Cons ● Cohorts may not load in time ○ Some ways to work around this ● Hard to run experiments at startup (i.e. onboarding) ● Hard to handle complex conditions Option 1: Config File + Cohorts Endpoint

Slide 39

Slide 39 text

Option 2: Parameter Experiments

Slide 40

Slide 40 text

Option 2: Parameter Experiments Configuration Experimentation System Mobile App

Slide 41

Slide 41 text

Option 2: Parameter Experiments Configuration Experimentation System Mobile App Request parameter

Slide 42

Slide 42 text

Option 2: Parameter Experiments Configuration Experimentation System Mobile App Request parameter Parse + Retrieve param value

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

Pros ● Easy to implement on clients ○ Just one call! ● Handles complex conditions well Cons ● Be mindful of parameter resolution speed ● Many parameters if sent from the server Option 2: Parameter Experiments

Slide 45

Slide 45 text

● Defaults ● Override capability ● Whitelisting ○ Employees only Other stuff you’ll want

Slide 46

Slide 46 text

Traditional Bucketing ● Optimizely ● MixPanel ● Apptimize Parameter Based ● Firebase Remote Config ● Planout SDKs Options to reuse

Slide 47

Slide 47 text

On-Device Logging

Slide 48

Slide 48 text

Bucketing System Your App Metrics Collection Pipeline (on device) Metrics Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

Main Principles

Slide 51

Slide 51 text

1. Define Domain Events + Delegate Formatting

Slide 52

Slide 52 text

EventManager Service 1 Service 5 Service 4 Service 3 Service 2 search.bar.text.update

Slide 53

Slide 53 text

EventManager Service 1 Service 5 Service 4 Service 3 Service 2 search.bar.text.update Common metadata (e.g. timestamps) gets attached here

Slide 54

Slide 54 text

EventManager Service 1 Service 5 Service 4 Service 3 Service 2 search.bar.text.update Event-specific metadata (e.g search term) gets attached here.

Slide 55

Slide 55 text

2. Developers Own Analytic Definitions

Slide 56

Slide 56 text

Product Manager Engineer Let’s track this as search_bar_text_update

Slide 57

Slide 57 text

Product Manager Engineer Okay!

Slide 58

Slide 58 text

Product Manager Engineer Let’s track this as search_bar_text_update Okay!

Slide 59

Slide 59 text

Product Manager Engineer We need to know when the user updated the search bar text

Slide 60

Slide 60 text

Product Manager Engineer We need to know when the user updated the search bar text Search for search.bar.text.update

Slide 61

Slide 61 text

3. Keep Documentation Close To Code

Slide 62

Slide 62 text

A possible workflow Machine Readable Documentation (YAML, Jsonschema, etc) Codegen Analytic Definitions (JavaPoet) Add/update docs Your App Publish to an internal Maven repo

Slide 63

Slide 63 text

● Segment ● mParticle Options to reuse

Slide 64

Slide 64 text

Event Deliverability

Slide 65

Slide 65 text

Bucketing System Your App Metrics Collection Pipeline (on device) Metrics Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service

Slide 66

Slide 66 text

Mainly A Problem If You Want Internal Collection

Slide 67

Slide 67 text

Mainly A Problem If You Want Internal Collection (You’ll want that eventually though)

Slide 68

Slide 68 text

At a High Level Your App Analytics Queue Search.list.view (1) Search.item.click.1 (2) Your Backend

Slide 69

Slide 69 text

At a High Level Your App Analytics Queue Search.list.view (1) Search.item.click.1 (2) Your Backend Send (1) and (2)

Slide 70

Slide 70 text

At a High Level Your App Analytics Queue Search.list.view (1) Search.item.click.1 (2) Your Backend Successful Response

Slide 71

Slide 71 text

At a High Level Your App Analytics Queue Your Backend

Slide 72

Slide 72 text

At a High Level Your App Analytics Queue Search.list.view (1) Search.item.click.1 (2) Your Backend

Slide 73

Slide 73 text

At a High Level Your App Analytics Queue Search.list.view (1) Search.item.click.1 (2) Your Backend Send (1) and (2)

Slide 74

Slide 74 text

At a High Level Your App Analytics Queue Search.list.view (1) Search.item.click.1 (2) Business.view (3) Your Backend Send (1) and (2)

Slide 75

Slide 75 text

At a High Level Your App Analytics Queue Search.list.view (1) Search.item.click.1 (2) Business.view (3) Your Backend Successful Response

Slide 76

Slide 76 text

At a High Level Your App Analytics Queue Business.view (3) Your Backend

Slide 77

Slide 77 text

At a High Level Your App Analytics Queue Search.list.view (1) Search.item.click.1 (2) Business.view (3) Your Backend Send (1) and (2)

Slide 78

Slide 78 text

At a High Level Your App Analytics Queue Search.list.view (1) Search.item.click.1 (2) Business.view (3) Your Backend

Slide 79

Slide 79 text

No content

Slide 80

Slide 80 text

It’s All About Tradeoffs

Slide 81

Slide 81 text

● How many analytics are you losing? ● How important are the analytics you’re losing? ● What’s the cost of gaining back those analytics? ○ Mostly engineering time/complexity Things to Consider

Slide 82

Slide 82 text

● Analytic channels ● Structured flat files ● JobManagers ● SQLite Databases ● Tape Queues Your building blocks

Slide 83

Slide 83 text

Example! Event Manager Search.list.view

Slide 84

Slide 84 text

Example! Event Manager Write to Queue Search.list.view Tape Queue

Slide 85

Slide 85 text

Example! Event Manager Is over flush threshold? Search.list.view Tape Queue

Slide 86

Slide 86 text

Example! Event Manager Tape Queue Copy + Clear Contents Search.list.view JobManager

Slide 87

Slide 87 text

Example! Event Manager Create Persistent Job with contents Search.list.view Tape Queue JobManager

Slide 88

Slide 88 text

● Flush analytics every 30s + 20 analytics ● Always send analytics via JobManager ● Flush analytics when app goes into the background ○ Can be detected easily with architecture components! ● It’s not perfect, but it works well enough What does Yelp do?

Slide 89

Slide 89 text

Analysis

Slide 90

Slide 90 text

Bucketing System Your App Metrics Collection Pipeline (on device) Metrics Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service

Slide 91

Slide 91 text

Two Main Questions

Slide 92

Slide 92 text

1. Does Funnel Conversion Increase?

Slide 93

Slide 93 text

What’s a funnel? Home Search Page Order Start Order Complete

Slide 94

Slide 94 text

What’s a funnel? Home 100k Users Search Page Order Start Order Complete

Slide 95

Slide 95 text

What’s a funnel? Home 100k Users Order Start Order Complete Home 100k Users Home 100k Users Search Page 80k Users (80%)

Slide 96

Slide 96 text

What’s a funnel? Home 100k Users Order Complete Home 100k Users Search Page 80k Users (80%) Order Start 10k Users (12.5%)

Slide 97

Slide 97 text

What’s a funnel? Home 100k Users Search Page 80k Users (80%) Order Start 10k Users (12.5%) Order Complete 5k Users (50%)

Slide 98

Slide 98 text

No content

Slide 99

Slide 99 text

2. Do Users (click/order/watch) More?

Slide 100

Slide 100 text

Lots of options here

Slide 101

Slide 101 text

Looking for a decent catch-all?

Slide 102

Slide 102 text

Eventually You’ll Need To Join This Data

Slide 103

Slide 103 text

Option A: BigQuery Export

Slide 104

Slide 104 text

Option A: BigQuery Export Pros ● Comes out of the box with Firebase ● Allows you to join information to get answers Cons ● Can’t be backfilled ● Still limited by firebase ● Probably need to write scripts in order to join your data

Slide 105

Slide 105 text

Option B: Internal Metrics

Slide 106

Slide 106 text

Option B: Internal Metrics Pros ● Maximum power + flexibility ● Can backfill ● Can execute arbitrary queries with just SQL Cons ● You need to maintain the whole pipeline yourself ● High cost to reach feature parity with 3rd party services

Slide 107

Slide 107 text

● A/B testing helps build a data-driven culture, so even if you don’t do it perfectly, you still get many benefits ● Know the basics of statistical experiments before you start A/B testing ● Keep engineers involved in analytic definition + analysis 3 Things to Take Home

Slide 108

Slide 108 text

Papers + General Reference on Experiments ● Exp-Platform.com ○ A/B Testing At Scale ● Twitter Unified Logging ● SOLID Analytics With RxJava Tools on Statistics ● A Concise Guide to Statistics ● Causal Inference In Statistics ● Optimizely Sample Size Calculator Resources

Slide 109

Slide 109 text

Thanks! ● My email - [email protected] ● My website - maltzj.com ● My twitter - @maltzj

Slide 110

Slide 110 text

www.yelp.com/careers/ We're Hiring!

Slide 111

Slide 111 text

Questions?

Slide 112

Slide 112 text

@YelpEngineering fb.com/YelpEngineers engineeringblog.yelp.com github.com/yelp