Advanced A/B Testing

Experimenting on Humans Aviran Mordo Head of Back-end Engineering @aviranm
www.linkedin.com/in/aviran www.aviransplace.com Sagy Rozman Back-end Guild master www.linkedin.com/in/sagyrozman @sagyrozman

Wix In Numbers •  Over 55M users + 1M new
users/month •  Static storage is >1.5Pb of data •  3 data centers + 3 clouds (Google, Amazon, Azure) •  1.5B HTTP requests/day •  900 people work at Wix, of which ~ 300 in R&D

1542 (A/B Tests in 3 months)

•  Basic A/B testing •  Experiment driven development •  PETRI
– Wix’s 3rd generation open source experiment system •  Challenges and best practices •  How to (code samples) Agenda

11:31 A/B Test

To B or NOT to B? A B

Home page results   (How many registered)

Experiment Driven Development

This is the Wix editor

Our gallery manager What can we improve?

Is this better?

Don’t be a loser

Product Experiments Toggles & Reporting Infrastructure

How do you know what is running?

If I “know” it is better, do I really need
to test it? Why so many?

Sign-up Choose Template Edit site Publish Premium The theory

Result = Fail

Intent matters

•  EVERY new feature is A/B tested •  We open
the new feature to a % of users ◦  Measure success ◦  If it is better, we keep it ◦  If worse, we check why and improve •  If ﬂawed, the impact is just for % of our users Conclusion

Start with 50% / 50% ?

• New code can have bugs • Conversion can drop • Usage can
drop • Unexpected cross test dependencies Sh*t happens (Test could fail)

•  Language •  GEO •  Browser •  User-agent •  OS
Minimize aﬀected users   (in case of failure)   Gradual exposure (percentage of…) •  Company employees •  User roles •  Any other criteria you have (extendable) •  All users

• First time visitors = Never visited wix.com • New registered users
= Untainted users Not all users are equal

We need that feature …and failure is not an option

Defensive Testing

Adding a mobile view

First trial failed     Performance had to be improved

Halting the test results in loss of data.    
What can we do about it?   

Solution – Pause the experiment! •  Maintain NEW experience for
already exposed users •  No additional users will be exposed to the NEW feature

PETRI’s pause implementation • Use cookies to persist assignment ◦  If
user changes browser assignment is unknown • Server side persistence solves this ◦  You pay in performance & scalability

Decision Keep feature Drop feature Improve code & resume experiment
Keep backwards compatibility for exposed users forever? Migrate users to another equivalent feature Drop it all together (users lose data/ work)

The road to success

•  Numbers look good but sample size is small • 
We need more data! •  Expand Reaching statistical signiﬁcance  25% 50% 75% 100% 75% 50% 25% 0% Control Group (A) Test Group (B)

Keep user experience consistent Control Group (A) Test Group (B)

•  Signed-in user (Editor) ◦  Test group assignment is determined
by the user ID ◦  Guarantee toss persistency across browsers •  Anonymous user (Home page) ◦  Test group assignment is randomly determined ◦  Can not guarantee persistent experience if changing browser •  11% of Wix users use more than one desktop browser Keeping persistent UX

There is MORE than one

# of active experiment Possible # of states 10 1024
20 1,048,576 30 1,073,741,824 Possible states >= 2^(# experiments) Wix has ~200 active experiments = 1.606938e+60

A/B testing introduces complexity

•  Override options (URL parameters, cookies, headers…) •  Near real
time user BI tools •  Integrated developer tools in the product Support tools   

Define Code Experiment Expand Merge code Close

•  Spec = Experiment template (in the code) ◦  Define
test groups ◦  Mandatory limitations (filters, user types) ◦  Scope = Group of related experiments (usually by product) •  Why is it needed ◦  Type safety ◦  Preventing human errors (typos, user types) ◦  Controlled by the developer (developer knows about the context) ◦  Conducting experiments in batch Define spec

public class ExampleSpecDefinition extends SpecDefinition { @Override protected ExperimentSpecBuilder customize(ExperimentSpecBuilder
builder) { return builder .withOwner("OWNERS_EMAIL_ADDRESS") .withScopes(aScopeDefinitionForAllUserTypes( "SOME_SCOPE")) .withTestGroups(asList("Group A", "Group B")); } } Spec code snippet

•  Experiment = “If” statement in the code Conducting experiment
final String result = laboratory.conductExperiment(key, fallback, new StringConverter()); if (result.equals("group a")) // execute group a's logic else if (result.equals("group b")) // execute group b's logic // in case conducting the experiment failed - the fallback value is returned // in this case you would usually execute the 'old' logic

•  Upload the specs to Petri server ◦  Enables to
deﬁne an experiment instance Upload spec { "creationDate" : "2014-01-09T13:11:26.846Z", "updateDate" : "2014-01-09T13:11:26.846Z", "scopes" : [ { "name" : "html-editor", "onlyForLoggedInUsers" : true }, { "name" : "html-viewer", "onlyForLoggedInUsers" : false } ], "testGroups" : [ "old", "new" ], "persistent" : true, "key" : "clientExperimentFullFlow1", "owner" : "" }

Start new experiment (limited population)

Manage experiment states

1.  Convert A/B Test to Feature Toggle (100% ON) 2. 
Merge the code 3.  Close the experiment 4.  Remove experiment instance Ending successful experiment

• Deﬁne spec • Use Petri client to conduct experiment in the
code (defaults to old) • Sync spec • Open experiment • Manage experiment state • End experiment Experiment lifecycle

Petri is more than just an A/B test framework Feature
toggle A/B Test Personalization Internal testing Continuous deployment Jira integration Experiments Dynamic conﬁguration QA Automated testing

•  Expose features internally to company employees •  Enable continuous
deployment with feature toggles •  Select assignment by sites (not only by users) •  Automatic selection of winning group* •  Exposing feature to #n of users* •  Integration with Jira * Planned feature Other things we (will) do with Petri

Petri is now an open source project https://github.com/wix/petri

Q&A Aviran Mordo Head of Back-end Engineering @aviranm www.linkedin.com/in/aviran www.aviransplace.com
https://github.com/wix/petri http://goo.gl/L7pHnd Sagy Rozman Back-end Guild master www.linkedin.com/in/sagyrozman @sagyrozman

Credits http://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg http://goo.gl/nEiepT https://www.flickr.com/photos/ilo_oli/2421536836 https://www.flickr.com/photos/dexxus/5791228117 http://goo.gl/SdeJ0o https://www.flickr.com/photos/112923805@N05/15005456062 https://www.flickr.com/photos/wiertz/8537791164 https://www.flickr.com/photos/laenulfean/5943132296 https://www.flickr.com/photos/torek/3470257377
https://www.flickr.com/photos/i5design/5393934753 https://www.flickr.com/photos/argonavigo/5320119828

•  Modeled experiment lifecycle •  Open source (developed using TDD
from day 1) •  Running at scale on production •  No deployment necessary •  Both back-end and front-end experiment •  Flexible architecture Why Petri

PERTI Server Your app Laboratory DB Logs

Advanced A/B Testing

Advanced A/B Testing

More Decks by Aviran Mordo

Other Decks in Programming

Featured

Transcript