Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Advanced A/B Testing

338220c813cb92748fdac5bbb6c5ee43?s=47 Aviran Mordo
November 24, 2014

Advanced A/B Testing

While A/B test is a very known and familiar methodology for conducting experiments on production when you do that on a large scale it has many challenges in the organization level and operational level.

At Wix we are practicing continuous delivery for over 4 years. Conducting A/B tests and writing feature toggles is at the core of our development process. However when doing so on a large scale, with over 1000 experiments every month, it holds many challenges and affect everyone in the company, from developers, product managers, QA, marketing and management.

In this talk we will explain what is the lifecycle of an experiment, some of the challenges we faced and the effect on our development process.

How an experiment begins its life
How an experiment is defined
How do you let non technical people control the experiment while preventing mistakes
How an experiment go live, what is the lifecycle of an experiment from beginning to end
What is the difference between client and server experiments
How do you keep the user experience and not confuse them
How does it affect the development process
How can QA test an environment that changes every 9 minutes
How can support help users when every user may be part of different experiment
How can we find if an experiment is causing errors when you have millions of permutations [at least 2(number of active experiments)]
What are the effects of always having multiple experiments on system architecture
What are the development patterns when working with AB test

At Wix we have developed our 3rd generation experiment system called PETRI, which is (will be) open sourced, that helps us maintain some order in a chaotic system that keep changing. We will also explain how PETRI works, what are the patterns in conducting experiments that will have a minimal effect on performance and user experience.

338220c813cb92748fdac5bbb6c5ee43?s=128

Aviran Mordo

November 24, 2014
Tweet

Transcript

  1. Experimenting on Humans Aviran Mordo Head of Back-end Engineering @aviranm

    www.linkedin.com/in/aviran www.aviransplace.com Sagy Rozman Back-end Guild master www.linkedin.com/in/sagyrozman @sagyrozman
  2. None
  3. Wix In Numbers •  Over 55M users + 1M new

    users/month •  Static storage is >1.5Pb of data •  3 data centers + 3 clouds (Google, Amazon, Azure) •  1.5B HTTP requests/day •  900 people work at Wix, of which ~ 300 in R&D
  4. 1542 (A/B Tests in 3 months)

  5. •  Basic A/B testing •  Experiment driven development •  PETRI

    – Wix’s 3rd generation open source experiment system •  Challenges and best practices •  How to (code samples) Agenda
  6. 11:31 A/B Test

  7. To B or NOT to B? A B

  8. Home page results 
 (How many registered)

  9. Experiment Driven Development

  10. This is the Wix editor

  11. Our gallery manager What can we improve?

  12. Is this better?

  13. Don’t be a loser

  14. Product Experiments Toggles & Reporting Infrastructure

  15. How do you know what is running?

  16. If I “know” it is better, do I really need

    to test it? Why so many?
  17. None
  18. Sign-up Choose Template Edit site Publish Premium The theory

  19. Result = Fail

  20. Intent matters

  21. •  EVERY new feature is A/B tested •  We open

    the new feature to a % of users ◦  Measure success ◦  If it is better, we keep it ◦  If worse, we check why and improve •  If flawed, the impact is just for % of our users Conclusion
  22. Start with 50% / 50% ?

  23. None
  24. • New code can have bugs • Conversion can drop • Usage can

    drop • Unexpected cross test dependencies Sh*t happens (Test could fail)
  25. •  Language •  GEO •  Browser •  User-agent •  OS

    Minimize affected users 
 (in case of failure) 
 Gradual exposure (percentage of…) •  Company employees •  User roles •  Any other criteria you have (extendable) •  All users
  26. • First time visitors = Never visited wix.com • New registered users

    = Untainted users Not all users are equal
  27. We need that feature …and failure is not an option

  28. Defensive Testing

  29. Adding a mobile view

  30. First trial failed 
 
 Performance had to be improved

  31. Halting the test results in loss of data. 
 


    What can we do about it?
 

  32. Solution – Pause the experiment! •  Maintain NEW experience for

    already exposed users •  No additional users will be exposed to the NEW feature
  33. PETRI’s pause implementation • Use cookies to persist assignment ◦  If

    user changes browser assignment is unknown • Server side persistence solves this ◦  You pay in performance & scalability
  34. Decision Keep feature Drop feature Improve code & resume experiment

    Keep backwards compatibility for exposed users forever? Migrate users to another equivalent feature Drop it all together (users lose data/ work)
  35. The road to success

  36. •  Numbers look good but sample size is small • 

    We need more data! •  Expand Reaching statistical significance
 25% 50% 75% 100% 75% 50% 25% 0% Control Group (A) Test Group (B)
  37. Keep user experience consistent Control Group (A) Test Group (B)

  38. •  Signed-in user (Editor) ◦  Test group assignment is determined

    by the user ID ◦  Guarantee toss persistency across browsers •  Anonymous user (Home page) ◦  Test group assignment is randomly determined ◦  Can not guarantee persistent experience if changing browser •  11% of Wix users use more than one desktop browser Keeping persistent UX
  39. There is MORE than one

  40. # of active experiment Possible # of states 10 1024

    20 1,048,576 30 1,073,741,824 Possible states >= 2^(# experiments) Wix has ~200 active experiments = 1.606938e+60
  41. A/B testing introduces complexity

  42. •  Override options (URL parameters, cookies, headers…) •  Near real

    time user BI tools •  Integrated developer tools in the product Support tools
 

  43. Define Code Experiment Expand Merge code Close

  44. •  Spec = Experiment template (in the code) ◦  Define

    test groups ◦  Mandatory limitations (filters, user types) ◦  Scope = Group of related experiments (usually by product) •  Why is it needed ◦  Type safety ◦  Preventing human errors (typos, user types) ◦  Controlled by the developer (developer knows about the context) ◦  Conducting experiments in batch Define spec
  45. public class ExampleSpecDefinition extends SpecDefinition { @Override protected ExperimentSpecBuilder customize(ExperimentSpecBuilder

    builder) { return builder .withOwner("OWNERS_EMAIL_ADDRESS") .withScopes(aScopeDefinitionForAllUserTypes( "SOME_SCOPE")) .withTestGroups(asList("Group A", "Group B")); } } Spec code snippet
  46. •  Experiment = “If” statement in the code Conducting experiment

    final String result = laboratory.conductExperiment(key, fallback, new StringConverter()); if (result.equals("group a")) // execute group a's logic else if (result.equals("group b")) // execute group b's logic // in case conducting the experiment failed - the fallback value is returned // in this case you would usually execute the 'old' logic
  47. •  Upload the specs to Petri server ◦  Enables to

    define an experiment instance Upload spec { "creationDate" : "2014-01-09T13:11:26.846Z", "updateDate" : "2014-01-09T13:11:26.846Z", "scopes" : [ { "name" : "html-editor", "onlyForLoggedInUsers" : true }, { "name" : "html-viewer", "onlyForLoggedInUsers" : false } ], "testGroups" : [ "old", "new" ], "persistent" : true, "key" : "clientExperimentFullFlow1", "owner" : "" }
  48. Start new experiment (limited population)

  49. Manage experiment states

  50. 1.  Convert A/B Test to Feature Toggle (100% ON) 2. 

    Merge the code 3.  Close the experiment 4.  Remove experiment instance Ending successful experiment
  51. • Define spec • Use Petri client to conduct experiment in the

    code (defaults to old) • Sync spec • Open experiment • Manage experiment state • End experiment Experiment lifecycle
  52. Petri is more than just an A/B test framework Feature

    toggle A/B Test Personalization Internal testing Continuous deployment Jira integration Experiments Dynamic configuration QA Automated testing
  53. •  Expose features internally to company employees •  Enable continuous

    deployment with feature toggles •  Select assignment by sites (not only by users) •  Automatic selection of winning group* •  Exposing feature to #n of users* •  Integration with Jira * Planned feature Other things we (will) do with Petri
  54. Petri is now an open source project https://github.com/wix/petri

  55. Q&A Aviran Mordo Head of Back-end Engineering @aviranm www.linkedin.com/in/aviran www.aviransplace.com

    https://github.com/wix/petri http://goo.gl/L7pHnd Sagy Rozman Back-end Guild master www.linkedin.com/in/sagyrozman @sagyrozman
  56. Credits http://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg http://goo.gl/nEiepT https://www.flickr.com/photos/ilo_oli/2421536836 https://www.flickr.com/photos/dexxus/5791228117 http://goo.gl/SdeJ0o https://www.flickr.com/photos/112923805@N05/15005456062 https://www.flickr.com/photos/wiertz/8537791164 https://www.flickr.com/photos/laenulfean/5943132296 https://www.flickr.com/photos/torek/3470257377

    https://www.flickr.com/photos/i5design/5393934753 https://www.flickr.com/photos/argonavigo/5320119828
  57. •  Modeled experiment lifecycle •  Open source (developed using TDD

    from day 1) •  Running at scale on production •  No deployment necessary •  Both back-end and front-end experiment •  Flexible architecture Why Petri
  58. PERTI Server Your app Laboratory DB Logs