Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2016 - Or Weizman - A/B Testing: Harder than just a color change

PyBay
August 21, 2016

2016 - Or Weizman - A/B Testing: Harder than just a color change

Description
Is your Product Manager asking you to test out different text or button colors? Not sure where to start? This talk will contain methodology and two case studies from Yelp’s Transaction Platform on how to properly run an experiment and get the best result. Learn about how to run a simple button color experiment, avoid pitfalls, test, and analyze the results with confidence. Statistical confidence!

Abstract
A/B testing is a common practice for websites...but where do you begin? This data-driven approach allows you to launch experiments and features with confidence. So how do you prepare, launch, and analyze an A/B experiment? How do you know for how long to keep it running? What about which metrics to track?

This talk will present a procedure developed to run an A/B experiment, from planning the task and understanding the key metrics to analyzing the results. We will cover both simple and more complex case study, which help us understand the challenges involved in running experiments.

This talk will cover a topic that will enable developers to make more data-driven decisions but has not been covered at Pycon. By providing case studies as motivation and a procedure to implement A/B testing this talk will excite the audience. Yelp runs multiple experiments on different aspects and the Transaction Platform team has gotten unique experience of needing to create experiments with limited traffic which will be discussed in the talk.

Bio
Or Weizman is an engineer for Yelp's Transaction Platform team, which enables users to transact with Yelp's extensive set of businesses through many third party providers.

https://youtu.be/7SA3a_AXA1g

PyBay

August 21, 2016
Tweet

More Decks by PyBay

Other Decks in Programming

Transcript

  1. 15%

  2. How? • Generate Hypothesis • Gather Necessary Data • Implementation

    and Testing • Roll out Experiment • Analyze Results
  3. • Key Metric • Secondary Metrics User lands on Biz

    page User clicks on Start Order User adds items User completes order User lands on Biz page User clicks on Start Order
  4. Math Time! • Minimum Detectable Effect of 5% • Baseline

    conversion • Statistical power (1 - β) of 95% • Significance level α of 5% 307,967 users per variation
  5. Kwargs are your friends not food self.button_color_cohort = kwargs.get('button_color_cohort', 'status_quo')

    if self.button_color_cohort == 'status_quo': button_color = 'orange' else: button_color = 'green'
  6. Steady • If possible roll out in this order ◦

    Internally for testing ◦ To a small percentage traffic ◦ To predetermined traffic percentage ▪ 50/50 in our case
  7. Sharing the Pool • Swimlanes is a way to pre-partition

    • Experiments assigned to lanes • Allows for control
  8. Testing Gets harder • Experiments can now have another condition

    • Previous if statement breaks • Complexity -> more cases
  9. if self.button_color_cohort == 'status_quo': button_color = 'orange' else: button_color =

    'green' Testing Gets harder if self.button_color_cohort == ‘enabled’: button_color = 'green' else: button_color = 'orange'
  10. Summary • Understand How to do A/B Experiments • Harder

    but Doable! ◦ Complexity can be managed ◦ Swimlanes are our solution • After just 3 experiments saw 15% increase in sales