Hypothesis driven development

Romain Piel

Songkick

Everyone has an opinion “My mum tried the app and
she didn’t like the colour of that button.” “Google guidelines say that we should show a splash screen.” “I really dislike hamburger menus, we should reconsider our navigation.”

The customer is always right!

Day 0 Day 1 Day 2 Day 3 Day 4
Feature launch Revenue Increase

Day 0 Day 1 Day 2 Day 3 Day 4
World with green button World with pink button

But what if we can have both variants running at
the same time in the same world?

How to run a A/B test?

1. Write your hypothesis 2. Design 3. Run 4. Analyse

1. Write your hypothesis Goal: Quantify the effect the button
colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue

2. Design Control Treatment

2. Design Key metrics - Purchase rate: purchases per user
- Click through rate: clicks per user

2. Design Data to collect - Experiment assignment - User
id/device id? - Clicks on the button - Screen size

2. Design Watch out for conﬂicting experiments!

2. Design 50% A 50% B 25% B 75% A
Uneven control groups can cause bias

50% Not part of experiment 2. Design Make sure to
have even control and treatment groups 25% A 25% B

How will it work? Control Treatment - experiment id -
device id / user id - experiment value

How will it work? App launch Fetch experiments  from cache
Fetch experiments from network } Firebase: Only fetch if cache expired Cache experiments

How will it work? @Override  public void onCreate(Bundle savedInstanceState) { 
super.onCreate(savedInstanceState);  setContentView(R.layout.activity_main);  ButterKnife.bind(this);    final int color;  if (getExperimentValue(“pink_buy_button”, false)) {  color = R.color.songkick_pink;  } else {  color = R.color.emerald_green;  }  buyButton.setBackgroundColor(ContextCompat.getColor(this, color));  }

Control Control

3. Run

3. Run Day 0 Day 1 Day 2 Day 3
Day 4 Day 5 But don’t stop the test too early!

When is “too early”? Current conversion rate: 3% Expected improvement:
10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)

10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) http://www.r-ﬁddle.org/

10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) n = 53210.3 So we need 53,210.3 * 2 = ~106,421 users to complete the test! http://www.r-ﬁddle.org/

4. Analyse Day 0 Day 1 Day 2 Day 3
Day 4

4. Analyse Did we prove our hypothesis? Nope. Day 0
Day 1 Day 2 Day 3 Day 4

1. Write your hypothesis 2. Design 3. Run 4. Analyse

Mobile challenges

fetchExperiments() fetchExperiments()

fetchExperiments() fetchExperiments() Don’t block the   UI for experiments

) fetchExperiments() Keep the experiment value in savedInstanceState

Web world dream v1.2 release v1.1 Sessions %

Mobile world reality v1.0 v1.1 v1.2 release 1 week +
Sessions %

v1.1 v1.2 - Verify that the conditions are the same
across versions - Verify that the code is the same across versions Control Variant 1

4.4 7.1 Screen size Accessibility Language OS version Acknowledge differences

Know your users

(Don’t) listen to your heart

Trust your data

Connect with your reviews

Get early feedback

The problem is out there, not at your desk

Tweak all the things!

Minimum Viable Research Jo Packer (Songkick) On AB Testing  Hector
Zarate (Spotify) A/B Testing, A Data Science Perspective Lisa Qian (Airbnb) Sources

Romain Piel @_rpiel Thanks!

Any questions? Romain Piel @_rpiel

Hypothesis driven development

Hypothesis driven development

More Decks by Romain Piel

Other Decks in Technology

Featured

Transcript