Slide 1

Slide 1 text

Romain Piel

Slide 2

Slide 2 text

Songkick

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Everyone has an opinion “My mum tried the app and she didn’t like the colour of that button.” “Google guidelines say that we should show a splash screen.” “I really dislike hamburger menus, we should reconsider our navigation.”

Slide 5

Slide 5 text

The customer is always right!

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Day 0 Day 1 Day 2 Day 3 Day 4 Feature launch Revenue Increase

Slide 8

Slide 8 text

Day 0 Day 1 Day 2 Day 3 Day 4 World with green button World with pink button

Slide 9

Slide 9 text

But what if we can have both variants running at the same time in the same world?

Slide 10

Slide 10 text

How to run a A/B test?

Slide 11

Slide 11 text

1. Write your hypothesis 2. Design 3. Run 4. Analyse

Slide 12

Slide 12 text

1. Write your hypothesis Goal: Quantify the effect the button colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue

Slide 13

Slide 13 text

1. Write your hypothesis Goal: Quantify the effect the button colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue

Slide 14

Slide 14 text

1. Write your hypothesis Goal: Quantify the effect the button colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue

Slide 15

Slide 15 text

2. Design Control Treatment

Slide 16

Slide 16 text

2. Design Key metrics - Purchase rate: purchases per user - Click through rate: clicks per user

Slide 17

Slide 17 text

2. Design Data to collect - Experiment assignment - User id/device id? - Clicks on the button - Screen size

Slide 18

Slide 18 text

2. Design Watch out for conflicting experiments!

Slide 19

Slide 19 text

2. Design 50% A 50% B 25% B 75% A Uneven control groups can cause bias

Slide 20

Slide 20 text

50% Not part of experiment 2. Design Make sure to have even control and treatment groups 25% A 25% B

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

How will it work? Control Treatment - experiment id - device id / user id - experiment value

Slide 23

Slide 23 text

How will it work? App launch Fetch experiments
 from cache Fetch experiments from network } Firebase: Only fetch if cache expired Cache experiments

Slide 24

Slide 24 text

How will it work? @Override
 public void onCreate(Bundle savedInstanceState) {
 super.onCreate(savedInstanceState);
 setContentView(R.layout.activity_main);
 ButterKnife.bind(this);
 
 final int color;
 if (getExperimentValue(“pink_buy_button”, false)) {
 color = R.color.songkick_pink;
 } else {
 color = R.color.emerald_green;
 }
 buyButton.setBackgroundColor(ContextCompat.getColor(this, color));
 }

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Control Control

Slide 27

Slide 27 text

3. Run

Slide 28

Slide 28 text

3. Run Day 0 Day 1 Day 2 Day 3 Day 4 Day 5 But don’t stop the test too early!

Slide 29

Slide 29 text

When is “too early”? Current conversion rate: 3% Expected improvement: 10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)

Slide 30

Slide 30 text

When is “too early”? Current conversion rate: 3% Expected improvement: 10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)

Slide 31

Slide 31 text

When is “too early”? Current conversion rate: 3% Expected improvement: 10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)

Slide 32

Slide 32 text

When is “too early”? Current conversion rate: 3% Expected improvement: 10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)

Slide 33

Slide 33 text

When is “too early”? Current conversion rate: 3% Expected improvement: 10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) http://www.r-fiddle.org/

Slide 34

Slide 34 text

When is “too early”? Current conversion rate: 3% Expected improvement: 10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) n = 53210.3 So we need 53,210.3 * 2 = ~106,421 users to complete the test! http://www.r-fiddle.org/

Slide 35

Slide 35 text

4. Analyse Day 0 Day 1 Day 2 Day 3 Day 4

Slide 36

Slide 36 text

4. Analyse Did we prove our hypothesis? Nope. Day 0 Day 1 Day 2 Day 3 Day 4

Slide 37

Slide 37 text

1. Write your hypothesis 2. Design 3. Run 4. Analyse

Slide 38

Slide 38 text

Mobile challenges

Slide 39

Slide 39 text

fetchExperiments() fetchExperiments()

Slide 40

Slide 40 text

fetchExperiments() fetchExperiments() Don’t block the 
 UI for experiments

Slide 41

Slide 41 text

) fetchExperiments() Keep the experiment value in savedInstanceState

Slide 42

Slide 42 text

Web world dream v1.2 release v1.1 Sessions %

Slide 43

Slide 43 text

Mobile world reality v1.0 v1.1 v1.2 release 1 week + Sessions %

Slide 44

Slide 44 text

v1.1 v1.2 - Verify that the conditions are the same across versions - Verify that the code is the same across versions Control Variant 1

Slide 45

Slide 45 text

4.4 7.1 Screen size Accessibility Language OS version Acknowledge differences

Slide 46

Slide 46 text

Know your users

Slide 47

Slide 47 text

(Don’t) listen to your heart

Slide 48

Slide 48 text

Trust your data

Slide 49

Slide 49 text

Connect with your reviews

Slide 50

Slide 50 text

Get early feedback

Slide 51

Slide 51 text

The problem is out there, not at your desk

Slide 52

Slide 52 text

Tweak all the things!

Slide 53

Slide 53 text

Minimum Viable Research Jo Packer (Songkick) On AB Testing
 Hector Zarate (Spotify) A/B Testing, A Data Science Perspective Lisa Qian (Airbnb) Sources

Slide 54

Slide 54 text

Romain Piel @_rpiel Thanks!

Slide 55

Slide 55 text

Any questions? Romain Piel @_rpiel