Save 37% off PRO during our Black Friday Sale! »

Hypothesis driven development

6cd7230b4e87700f07a7bd5d28c54eb3?s=47 Romain Piel
October 27, 2016

Hypothesis driven development

6cd7230b4e87700f07a7bd5d28c54eb3?s=128

Romain Piel

October 27, 2016
Tweet

Transcript

  1. Romain Piel

  2. Songkick

  3. None
  4. Everyone has an opinion “My mum tried the app and

    she didn’t like the colour of that button.” “Google guidelines say that we should show a splash screen.” “I really dislike hamburger menus, we should reconsider our navigation.”
  5. The customer is always right!

  6. None
  7. Day 0 Day 1 Day 2 Day 3 Day 4

    Feature launch Revenue Increase
  8. Day 0 Day 1 Day 2 Day 3 Day 4

    World with green button World with pink button
  9. But what if we can have both variants running at

    the same time in the same world?
  10. How to run a A/B test?

  11. 1. Write your hypothesis 2. Design 3. Run 4. Analyse

  12. 1. Write your hypothesis Goal: Quantify the effect the button

    colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue
  13. 1. Write your hypothesis Goal: Quantify the effect the button

    colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue
  14. 1. Write your hypothesis Goal: Quantify the effect the button

    colour has on metrics Hypothesis: compared to a green button, a pink button will be more visible and push more users to click on it A fraction of these additional users will complete the transaction, increasing revenue
  15. 2. Design Control Treatment

  16. 2. Design Key metrics - Purchase rate: purchases per user

    - Click through rate: clicks per user
  17. 2. Design Data to collect - Experiment assignment - User

    id/device id? - Clicks on the button - Screen size
  18. 2. Design Watch out for conflicting experiments!

  19. 2. Design 50% A 50% B 25% B 75% A

    Uneven control groups can cause bias
  20. 50% Not part of experiment 2. Design Make sure to

    have even control and treatment groups 25% A 25% B
  21. None
  22. How will it work? Control Treatment - experiment id -

    device id / user id - experiment value
  23. How will it work? App launch Fetch experiments
 from cache

    Fetch experiments from network } Firebase: Only fetch if cache expired Cache experiments
  24. How will it work? @Override
 public void onCreate(Bundle savedInstanceState) {


    super.onCreate(savedInstanceState);
 setContentView(R.layout.activity_main);
 ButterKnife.bind(this);
 
 final int color;
 if (getExperimentValue(“pink_buy_button”, false)) {
 color = R.color.songkick_pink;
 } else {
 color = R.color.emerald_green;
 }
 buyButton.setBackgroundColor(ContextCompat.getColor(this, color));
 }
  25. None
  26. Control Control

  27. 3. Run

  28. 3. Run Day 0 Day 1 Day 2 Day 3

    Day 4 Day 5 But don’t stop the test too early!
  29. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
  30. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
  31. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
  32. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8)
  33. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) http://www.r-fiddle.org/
  34. When is “too early”? Current conversion rate: 3% Expected improvement:

    10% (new conversation rate 3.3%) False positive rate: 5% False negative rate: 20% power.prop.test(p1=0.03, p2=0.033, sig.level=0.05, power=0.8) n = 53210.3 So we need 53,210.3 * 2 = ~106,421 users to complete the test! http://www.r-fiddle.org/
  35. 4. Analyse Day 0 Day 1 Day 2 Day 3

    Day 4
  36. 4. Analyse Did we prove our hypothesis? Nope. Day 0

    Day 1 Day 2 Day 3 Day 4
  37. 1. Write your hypothesis 2. Design 3. Run 4. Analyse

  38. Mobile challenges

  39. fetchExperiments() fetchExperiments()

  40. fetchExperiments() fetchExperiments() Don’t block the 
 UI for experiments

  41. ) fetchExperiments() Keep the experiment value in savedInstanceState

  42. Web world dream v1.2 release v1.1 Sessions %

  43. Mobile world reality v1.0 v1.1 v1.2 release 1 week +

    Sessions %
  44. v1.1 v1.2 - Verify that the conditions are the same

    across versions - Verify that the code is the same across versions Control Variant 1
  45. 4.4 7.1 Screen size Accessibility Language OS version Acknowledge differences

  46. Know your users

  47. (Don’t) listen to your heart

  48. Trust your data

  49. Connect with your reviews

  50. Get early feedback

  51. The problem is out there, not at your desk

  52. Tweak all the things!

  53. Minimum Viable Research Jo Packer (Songkick) On AB Testing
 Hector

    Zarate (Spotify) A/B Testing, A Data Science Perspective Lisa Qian (Airbnb) Sources
  54. Romain Piel @_rpiel Thanks!

  55. Any questions? Romain Piel @_rpiel