Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A/B Testing From The Ground Up (Nov 2017)

C7393b7ba7ec9c8890dd77d209fbb3c9?s=47 maltzj
November 06, 2017

A/B Testing From The Ground Up (Nov 2017)

C7393b7ba7ec9c8890dd77d209fbb3c9?s=128

maltzj

November 06, 2017
Tweet

Transcript

  1. Jonathan Maltz maltz@yelp.com/@maltzj A/B Testing From The Ground Up

  2. Yelp’s Mission Connecting people with great local businesses.

  3. Hi! I’m Maltz! • Android at Yelp! • Now: Building

    experimentation systems • Previously: Full-Stack and data stuff at Eat24.
  4. What Will We Talk About Today?

  5. A/B Testing

  6. • What are A/B tests? • Why run A/B tests?

    • What infrastructure is needed to A/B test? • What can you buy vs. build? • What are pitfalls to watch out for? Specifically
  7. But First...

  8. Why A/B Test?

  9. Click me! Time Clicks

  10. Click me! Time Clicks

  11. We should keep the button Red!

  12. But wait...

  13. Click me! Time Clicks

  14. None
  15. Click me! Click me! 50% of Users Vs. 50% of

    Users
  16. None
  17. Inform The Hippo!

  18. Inform The Hippo! (Highest Paid Person’s Opinion)

  19. None
  20. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  21. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  22. Bucketing System

  23. At a high level Identifier Bucket Randomization function (probably a

    hash)
  24. Green: 50% 0 100 Red: 50% get_bucket(id + salt) %

    100 get_bucket(“my_exp163”) % 100 get_bucket(...419) % 100 get_bucket(...802) % 100
  25. None
  26. Common Pitfalls Green: 100% 0 1 Green: 80% 0 1

    Red: 20% Green: 50% 0 1 Red: 50% 0 1 Red: 100% Time get_bucket(id + salt) get_bucket(“my_exp163”) get_bucket(...419) get_bucket(...802)
  27. Common Pitfalls cont... Green: 100% 0 1 Time get_bucket(id +

    salt) get_bucket(...261) get_bucket(...591) get_bucket(...812) Green: 90% 0 1 Yellow: 5% Red: 5% Green: 80% 0 1 Yellow: 10% Red: 10%
  28. Common Pitfalls cont... Green: 100% 0 1 Time get_bucket(id +

    salt) get_bucket(...311) get_bucket(...451) get_bucket(...723) Green: 50% 0 1 Red-Buffer (no-op): 20% Red: 5% Green: 50% 0 1 E: 5% Red: 25% Yellow: 25% Yellow-Buffer (no-op): 20% Yellow 5%
  29. Don’t Forget The Salt!

  30. Common Pitfalls cont... Status Quo: 50% get_bucket(id) Cohort A: 50%

    Experiment 1 Status Quo: 50% Cohort A: 50% Experiment 2 Cohort B 50% Status Quo: 50% Experiment 3 Cohort A 50%
  31. getBucket( id + exp_name)

  32. How To Implement It?

  33. Option 1: Config File + Buckets Endpoint

  34. Option 1: Config File + Buckets Endpoint YAML File Backend

    Mobile App
  35. Option 1: Config File + Buckets Endpoint YAML File Backend

    Mobile App Request Cohorts w/ id
  36. Option 1: Config File + Buckets Endpoint YAML File Backend

    Mobile App Request Cohorts w/ id Parse + Retrieve Cohorts
  37. None
  38. Pros • Easy to implement • Easy to understand Cons

    • Cohorts may not load in time ◦ Some ways to work around this • Hard to run experiments at startup (i.e. onboarding) • Hard to handle complex conditions Option 1: Config File + Cohorts Endpoint
  39. Option 2: Parameter Experiments

  40. Option 2: Parameter Experiments Configuration Experimentation System Mobile App

  41. Option 2: Parameter Experiments Configuration Experimentation System Mobile App Request

    parameter
  42. Option 2: Parameter Experiments Configuration Experimentation System Mobile App Request

    parameter Parse + Retrieve param value
  43. None
  44. Pros • Easy to implement on clients ◦ Just one

    call! • Handles complex conditions well Cons • Be mindful of parameter resolution speed • Many parameters if sent from the server Option 2: Parameter Experiments
  45. • Defaults • Override capability • Whitelisting ◦ Employees only

    Other stuff you’ll want
  46. Traditional Bucketing • Optimizely • MixPanel • Apptimize Parameter Based

    • Firebase Remote Config • Planout SDKs Options to reuse
  47. On-Device Logging

  48. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  49. None
  50. Main Principles

  51. 1. Define Domain Events + Delegate Formatting

  52. EventManager Service 1 Service 5 Service 4 Service 3 Service

    2 search.bar.text.update
  53. EventManager Service 1 Service 5 Service 4 Service 3 Service

    2 search.bar.text.update Common metadata (e.g. timestamps) gets attached here
  54. EventManager Service 1 Service 5 Service 4 Service 3 Service

    2 search.bar.text.update Event-specific metadata (e.g search term) gets attached here.
  55. 2. Developers Own Analytic Definitions

  56. Product Manager Engineer Let’s track this as search_bar_text_update

  57. Product Manager Engineer Okay!

  58. Product Manager Engineer Let’s track this as search_bar_text_update Okay!

  59. Product Manager Engineer We need to know when the user

    updated the search bar text
  60. Product Manager Engineer We need to know when the user

    updated the search bar text Search for search.bar.text.update
  61. 3. Keep Documentation Close To Code

  62. A possible workflow Machine Readable Documentation (YAML, Jsonschema, etc) Codegen

    Analytic Definitions (JavaPoet) Add/update docs Your App Publish to an internal Maven repo
  63. • Segment • mParticle Options to reuse

  64. Event Deliverability

  65. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  66. Mainly A Problem If You Want Internal Collection

  67. Mainly A Problem If You Want Internal Collection (You’ll want

    that eventually though)
  68. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Your Backend
  69. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Your Backend Send (1) and (2)
  70. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Your Backend Successful Response
  71. At a High Level Your App Analytics Queue Your Backend

  72. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Your Backend
  73. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Your Backend Send (1) and (2)
  74. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Business.view (3) Your Backend Send (1) and (2)
  75. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Business.view (3) Your Backend Successful Response
  76. At a High Level Your App Analytics Queue Business.view (3)

    Your Backend
  77. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Business.view (3) Your Backend Send (1) and (2)
  78. At a High Level Your App Analytics Queue Search.list.view (1)

    Search.item.click.1 (2) Business.view (3) Your Backend
  79. None
  80. It’s All About Tradeoffs

  81. • How many analytics are you losing? • How important

    are the analytics you’re losing? • What’s the cost of gaining back those analytics? ◦ Mostly engineering time/complexity Things to Consider
  82. • Analytic channels • Structured flat files • JobManagers •

    SQLite Databases • Tape Queues Your building blocks
  83. Example! Event Manager Search.list.view

  84. Example! Event Manager Write to Queue Search.list.view Tape Queue

  85. Example! Event Manager Is over flush threshold? Search.list.view Tape Queue

  86. Example! Event Manager Tape Queue Copy + Clear Contents Search.list.view

    JobManager
  87. Example! Event Manager Create Persistent Job with contents Search.list.view Tape

    Queue JobManager
  88. • Flush analytics every 30s + 20 analytics • Always

    send analytics via JobManager • Flush analytics when app goes into the background ◦ Can be detected easily with architecture components! • It’s not perfect, but it works well enough What does Yelp do?
  89. Analysis

  90. Bucketing System Your App Metrics Collection Pipeline (on device) Metrics

    Service Metrics Service Metrics Service Sends Allocation Information Log on-device events Forward to metrics service
  91. Two Main Questions

  92. 1. Does Funnel Conversion Increase?

  93. What’s a funnel? Home Search Page Order Start Order Complete

  94. What’s a funnel? Home 100k Users Search Page Order Start

    Order Complete
  95. What’s a funnel? Home 100k Users Order Start Order Complete

    Home 100k Users Home 100k Users Search Page 80k Users (80%)
  96. What’s a funnel? Home 100k Users Order Complete Home 100k

    Users Search Page 80k Users (80%) Order Start 10k Users (12.5%)
  97. What’s a funnel? Home 100k Users Search Page 80k Users

    (80%) Order Start 10k Users (12.5%) Order Complete 5k Users (50%)
  98. None
  99. 2. Do Users (click/order/watch) More?

  100. Lots of options here

  101. Looking for a decent catch-all?

  102. Eventually You’ll Need To Join This Data

  103. Option A: BigQuery Export

  104. Option A: BigQuery Export Pros • Comes out of the

    box with Firebase • Allows you to join information to get answers Cons • Can’t be backfilled • Still limited by firebase • Probably need to write scripts in order to join your data
  105. Option B: Internal Metrics

  106. Option B: Internal Metrics Pros • Maximum power + flexibility

    • Can backfill • Can execute arbitrary queries with just SQL Cons • You need to maintain the whole pipeline yourself • High cost to reach feature parity with 3rd party services
  107. • A/B testing helps build a data-driven culture, so even

    if you don’t do it perfectly, you still get many benefits • Know the basics of statistical experiments before you start A/B testing • Keep engineers involved in analytic definition + analysis 3 Things to Take Home
  108. Papers + General Reference on Experiments • Exp-Platform.com ◦ A/B

    Testing At Scale • Twitter Unified Logging • SOLID Analytics With RxJava Tools on Statistics • A Concise Guide to Statistics • Causal Inference In Statistics • Optimizely Sample Size Calculator Resources
  109. Thanks! • My email - maltz@yelp.com • My website -

    maltzj.com • My twitter - @maltzj
  110. www.yelp.com/careers/ We're Hiring!

  111. Questions?

  112. @YelpEngineering fb.com/YelpEngineers engineeringblog.yelp.com github.com/yelp