Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hypothesis driven development

Romain Piel
October 27, 2016

Hypothesis driven development

Romain Piel

October 27, 2016
Tweet

More Decks by Romain Piel

Other Decks in Technology

Transcript

  1. Everyone has an opinion “My mum tried the app and

    she didn’t like the colour of that button.” “Google guidelines say that we should show a splash screen.” “I really dislike hamburger menus, we should reconsider our navigation.”
  2. Day 0 Day 1 Day 2 Day 3 Day 4

    Feature launch Revenue Increase
  3. Day 0 Day 1 Day 2 Day 3 Day 4

    World with green button World with pink button
  4. But what if we can have both variants running at

    the same time in the same world?
  5. 1. Write your hypothesis Goal: Quantify the effect the button

    colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue
  6. 1. Write your hypothesis Goal: Quantify the effect the button

    colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue
  7. 1. Write your hypothesis Goal: Quantify the effect the button

    colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue
  8. 2. Design Key metrics - Purchase rate: purchases per user

    - Click through rate: clicks per user
  9. 2. Design Data to collect - Experiment assignment - User

    id/device id? - Clicks on the button - Screen size
  10. 2. Design 50% A 50% B 25% B 75% A

    Uneven control groups can cause bias
  11. 50% Not part of experiment 2. Design Make sure to

    have even control and treatment groups 25% A 25% B
  12. How will it work? Control Treatment - experiment id -

    device id / user id - experiment value
  13. How will it work? App launch Fetch experiments
 from cache

    Fetch experiments from network } Firebase: Only fetch if cache expired Cache experiments
  14. How will it work? @Override
 public void onCreate(Bundle savedInstanceState) {


    super.onCreate(savedInstanceState);
 setContentView(R.layout.activity_main);
 ButterKnife.bind(this);
 
 final int color;
 if (getExperimentValue(“pink_buy_button”, false)) {
 color = R.color.songkick_pink;
 } else {
 color = R.color.emerald_green;
 }
 buyButton.setBackgroundColor(ContextCompat.getColor(this, color));
 }
  15. 3. Run Day 0 Day 1 Day 2 Day 3

    Day 4 Day 5 But don’t stop the test too early!
  16. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
  17. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
  18. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
  19. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
  20. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) http://www.r-fiddle.org/
  21. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) n = 53210.3 So we need 53,210.3 * 2 = ~106,421 users to complete the test! http://www.r-fiddle.org/
  22. v1.1 v1.2 - Verify that the conditions are the same

    across versions - Verify that the code is the same across versions Control Variant 1
  23. Minimum Viable Research Jo Packer (Songkick) On AB Testing
 Hector

    Zarate (Spotify) A/B Testing, A Data Science Perspective Lisa Qian (Airbnb) Sources