Romain Piel
October 27, 2016
220

Hypothesis driven development

October 27, 2016

Transcript

3. Everyone has an opinion “My mum tried the app and

she didn’t like the colour of that button.” “Google guidelines say that we should show a splash screen.” “I really dislike hamburger menus, we should reconsider our navigation.”

5. Day 0 Day 1 Day 2 Day 3 Day 4

Feature launch Revenue Increase
6. Day 0 Day 1 Day 2 Day 3 Day 4

World with green button World with pink button
7. But what if we can have both variants running at

the same time in the same world?

10. 1. Write your hypothesis Goal: Quantify the effect the button

colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue
11. 1. Write your hypothesis Goal: Quantify the effect the button

colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue
12. 1. Write your hypothesis Goal: Quantify the effect the button

colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue

14. 2. Design Key metrics - Purchase rate: purchases per user

- Click through rate: clicks per user
15. 2. Design Data to collect - Experiment assignment - User

id/device id? - Clicks on the button - Screen size

17. 2. Design 50% A 50% B 25% B 75% A

Uneven control groups can cause bias
18. 50% Not part of experiment 2. Design Make sure to

have even control and treatment groups 25% A 25% B
19. How will it work? Control Treatment - experiment id -

device id / user id - experiment value
20. How will it work? App launch Fetch experiments  from cache

Fetch experiments from network } Firebase: Only fetch if cache expired Cache experiments
21. How will it work? @Override  public void onCreate(Bundle savedInstanceState) {

super.onCreate(savedInstanceState);  setContentView(R.layout.activity_main);  ButterKnife.bind(this);    final int color;  if (getExperimentValue(“pink_buy_button”, false)) {  color = R.color.songkick_pink;  } else {  color = R.color.emerald_green;  }  buyButton.setBackgroundColor(ContextCompat.getColor(this, color));  }

24. 3. Run Day 0 Day 1 Day 2 Day 3

Day 4 Day 5 But don’t stop the test too early!
25. When is “too early”? Current conversion rate: 3% Expected improvement:

10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
26. When is “too early”? Current conversion rate: 3% Expected improvement:

10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
27. When is “too early”? Current conversion rate: 3% Expected improvement:

10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
28. When is “too early”? Current conversion rate: 3% Expected improvement:

10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
29. When is “too early”? Current conversion rate: 3% Expected improvement:

10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) http://www.r-ﬁddle.org/
30. When is “too early”? Current conversion rate: 3% Expected improvement:

10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) n = 53210.3 So we need 53,210.3 * 2 = ~106,421 users to complete the test! http://www.r-ﬁddle.org/

Day 4
32. 4. Analyse Did we prove our hypothesis? Nope. Day 0

Day 1 Day 2 Day 3 Day 4

Sessions %
40. v1.1 v1.2 - Verify that the conditions are the same

across versions - Verify that the code is the same across versions Control Variant 1