Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How we A/B test at the FT: and what happened wh...

How we A/B test at the FT: and what happened when we started testing headlines too

Talk given at Continuous Lifecycle London 2018

Amy Nicholson

May 17, 2018
Tweet

Other Decks in Technology

Transcript

  1. How we A/B test at the Financial Times; and what

    happened when we started testing headlines too Amy Nicholson, Technical Product Manager, FT.com
  2. Primary metric is click through rate Click Through Rate Of

    all the people who landed on a page with that headline, how many of those users clicked on that story
  3. FAQs Why is it called Ammit? Can I test the

    headlines? How long will it take?
  4. Traffic landing on the feature Uplift expected with this change

    Quicker tests Headline test Two of the factors involved: uplift and traffic Redesign the footer Swap order in nav bar
  5. FAQs Why is it called Ammit? Can I test the

    headlines? How long will it take? Is there traffic available for the headline test?
  6. Hypothesis If we change the layout from A to B,

    the overall clicks on the component would increase
  7. If a user is in B for both tests and

    clicks the story... A B A B
  8. Show tests to different buckets of users if they interfere

    0 onwardJourney (subs, article) audio (subs, article) 100
  9. Buckets contain anonymous and subscribers 0 onwardJourney (subs, article) audio

    (subs, article) 100 Barrier (paywall) test (anon, barrier)
  10. Swim lanes for tests that can run in parallel 0

    onwardJourney (subs, article) audio (subs, article) 100 Layout test (subs, frontpage) Barrier (paywall) test (anon, barrier)
  11. If the test is for subscribers then there is traffic

    available 0 100 Available Available onwardJourney (subs, article) audio (subs, article) Layout test (subs, frontpage) Barrier (paywall) test (anon, barrier)
  12. If the test is for subscribers then there is traffic

    available 0 100 Available onwardJourney (subs, article) audio (subs, article) Layout test (subs, frontpage) Barrier (paywall) test (anon, barrier) Headline test (subs, frontpage)
  13. FAQs Why is it called Ammit? Can I test the

    headlines? How long will it take? Is there traffic available for my headline test? Can my test overlap?
  14. Split the traffic so each test gets 50%......will take a

    while 0 100 Blurb/No blurb Green buttons/Red buttons
  15. Run the tests one after another…..will also take a while

    0 100 Blurb/No blurb Green buttons/Red buttons
  16. MVT

  17. Run a multivariate test…..this will take a while too 0

    100 Blurb + Green No blurb + Green Blurb + Red No blurb + Red
  18. Overlap tests if the interference is likely to be small

    0 100 Blurb/No blurb Green button/Red button
  19. Run tests together but look at them separately A B

    Blurb/No blurb 10% 9% Winner: Blurb Green/Red buttons 10% 8% Winner: Green Blurb + Green buttons win
  20. Looking at them as an MVT has different winners Blurb

    + Green Blurb + Red No blurb + Green No blurb + Red 10% 12% 12.5% 9%
  21. FAQs Why is it called Ammit? Can I test the

    headlines? How long will it take? Is there traffic available for my headline test? Can my test overlap? Could you just quickly check my headline test is ok?
  22. Check that the test is there 0 100 Available onwardJourney

    (subs, article) audio (subs, article) Layout test (subs, frontpage) Barrier (paywall) test (anon, barrier) Headline test (subs, frontpage)
  23. Which users should see the test? 0 100 Available onwardJourney

    (subs, article) audio (subs, article) Layout test (subs, frontpage) Barrier (paywall) test (anon, barrier) Headline test (subs, frontpage)
  24. Filter tests that don’t apply to that user’s bucket 0

    100 Available onwardJourney (subs, article) audio (subs, article) Layout test (subs, frontpage) Barrier (paywall) test (anon, barrier) Headline test (subs, frontpage)
  25. Filter tests that don’t apply to that user’s status 0

    100 audio (subs, article) Barrier (paywall) test (anon, barrier) Headline test (subs, frontpage)
  26. Filter tests that don’t apply to that user’s status 0

    100 audio (subs, article) Headline test (subs, frontpage)
  27. Ammit sends tests back to Preflight Preflight Ammit Tests User

    data Subscriber audio (subs, article) Headline test (subs, frontpage)
  28. Ammit assigns variant Preflight Ammit Tests User data Subscriber audio:control

    (subs, article) Headline test: variant (subs, frontpage)
  29. Preflight filters tests further Preflight Ammit Tests User data Subscriber

    Subscriber => frontpage audio:control (subs, article) Headline test: variant (subs, frontpage)
  30. Preflight filters tests further Preflight Ammit Tests User data Subscriber

    Frontpage app Subscriber => frontpage Headline test:variant (subs, frontpage)
  31. Test and variant comes from Preflight Headline test on? User

    in test? In variant? Preflight Frontpage app Headline test:variant (subs, frontpage)
  32. Test is always turned on Headline test on? User in

    test? In variant? Preflight Frontpage app Headline test:variant (subs, frontpage) Headline test on? User in test? In variant?
  33. Each test starts when there are 2 headlines for 1

    story User in test? In variant? 2 x headlines? Elasticsearch
  34. FAQs Why is it called Ammit? Can I test the

    headlines? How long will it take? Is there traffic available for my headline test? Can my test overlap? Could you just quickly check my headline test is ok? Which variant is winning?
  35. How you should run a frequentist test Estimate duration of

    test based on how much data you’d need to reach statistical significance Collect data When test duration is over…. check results Significant Not significant
  36. FAQs Why is it called Ammit? Can I test this?

    How long will it take? Is there traffic available for my test? Can my test overlap? Could you just quickly check my headline test is ok? Which variant is winning? Which variant won?
  37. An event fires when the user sees the variant headline

    eventType: pageView timestamp: 15/5/2018 08:55 userID: user-123
  38. An event fires when the user clicks the variant headline

    eventType: pageView timestamp: 15/5/2018 08:55 userID: user-123 eventType: click timestamp: 15/5/2018 08:56 userID: user-123 storyID: story-456
  39. Where do these events go? eventType: pageView timestamp: 15/5/2018 08:55

    userID: user-123 eventType: click timestamp: 15/5/2018 08:56 userID: user-123 storyID: story-456 Redshift
  40. Replace Redshift with smaller, faster Volt eventType: pageView timestamp: 15/5/2018

    08:55 userID: user-123 eventType: click timestamp: 15/5/2018 08:56 userID: user-123 storyID: story-456 Volt
  41. B

  42. FAQs Why is it called Ammit? Can I test this?

    How long will it take? Is there traffic available for my test? Can my test overlap? Could you just quickly check my headline test is ok? Which variant is winning? Which variant won? Was it close to significance?
  43. You have to wait X weeks for a test and

    then…. Estimate duration of test based on how much data you’d need to reach statistical significance Collect data When test duration is over…. check results Significant Not significant
  44. B performed better than A If we run this experiment

    100 times, on less than 5 of those occasions, the observed difference between B and A would actually have occurred by random chance
  45. FAQs Why is it called Ammit? Can I test the

    headlines? How long will it take? Is there traffic available for my headline test? Can my test overlap? Could you just quickly check my headline test is ok? Which variant is winning? Which variant won? Was it close to significance?
  46. A B

  47. A B

  48. A B

  49. A B

  50. A B

  51. A B