Advanced A/B Testing - Speaker Deck

Slide 1

Slide 1 text

Experimenting on Humans Aviran Mordo Head of Back-end Engineering @aviranm www.linkedin.com/in/aviran www.aviransplace.com Sagy Rozman Back-end Guild master www.linkedin.com/in/sagyrozman @sagyrozman

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Wix In Numbers •  Over 55M users + 1M new users/month •  Static storage is >1.5Pb of data •  3 data centers + 3 clouds (Google, Amazon, Azure) •  1.5B HTTP requests/day •  900 people work at Wix, of which ~ 300 in R&D

Slide 4

Slide 4 text

1542 (A/B Tests in 3 months)

Slide 5

Slide 5 text

•  Basic A/B testing •  Experiment driven development •  PETRI – Wix’s 3rd generation open source experiment system •  Challenges and best practices •  How to (code samples) Agenda

Slide 6

Slide 6 text

11:31 A/B Test

Slide 7

Slide 7 text

To B or NOT to B? A B

Slide 8

Slide 8 text

Home page results   (How many registered)

Slide 9

Slide 9 text

Experiment Driven Development

Slide 10

Slide 10 text

This is the Wix editor

Slide 11

Slide 11 text

Our gallery manager What can we improve?

Slide 12

Slide 12 text

Is this better?

Slide 13

Slide 13 text

Don’t be a loser

Slide 14

Slide 14 text

Product Experiments Toggles & Reporting Infrastructure

Slide 15

Slide 15 text

How do you know what is running?

Slide 16

Slide 16 text

If I “know” it is better, do I really need to test it? Why so many?

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Sign-up Choose Template Edit site Publish Premium The theory

Slide 19

Slide 19 text

Result = Fail

Slide 20

Slide 20 text

Intent matters

Slide 21

Slide 21 text

•  EVERY new feature is A/B tested •  We open the new feature to a % of users ○  Measure success ○  If it is better, we keep it ○  If worse, we check why and improve •  If ﬂawed, the impact is just for % of our users Conclusion

Slide 22

Slide 22 text

Start with 50% / 50% ?

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

• New code can have bugs • Conversion can drop • Usage can drop • Unexpected cross test dependencies Sh*t happens (Test could fail)

Slide 25

Slide 25 text

•  Language •  GEO •  Browser •  User-agent •  OS Minimize aﬀected users   (in case of failure)   Gradual exposure (percentage of…) •  Company employees •  User roles •  Any other criteria you have (extendable) •  All users

Slide 26

Slide 26 text

• First time visitors = Never visited wix.com • New registered users = Untainted users Not all users are equal

Slide 27

Slide 27 text

We need that feature …and failure is not an option

Slide 28

Slide 28 text

Defensive Testing

Slide 29

Slide 29 text

Adding a mobile view

Slide 30

Slide 30 text

First trial failed     Performance had to be improved

Slide 31

Slide 31 text

Halting the test results in loss of data.     What can we do about it?   

Slide 32

Slide 32 text

Solution – Pause the experiment! •  Maintain NEW experience for already exposed users •  No additional users will be exposed to the NEW feature

Slide 33

Slide 33 text

PETRI’s pause implementation • Use cookies to persist assignment ○  If user changes browser assignment is unknown • Server side persistence solves this ○  You pay in performance & scalability

Slide 34

Slide 34 text

Decision Keep feature Drop feature Improve code & resume experiment Keep backwards compatibility for exposed users forever? Migrate users to another equivalent feature Drop it all together (users lose data/ work)

Slide 35

Slide 35 text

The road to success

Slide 36

Slide 36 text

•  Numbers look good but sample size is small •  We need more data! •  Expand Reaching statistical signiﬁcance  25% 50% 75% 100% 75% 50% 25% 0% Control Group (A) Test Group (B)

Slide 37

Slide 37 text

Keep user experience consistent Control Group (A) Test Group (B)

Slide 38

Slide 38 text

•  Signed-in user (Editor) ○  Test group assignment is determined by the user ID ○  Guarantee toss persistency across browsers •  Anonymous user (Home page) ○  Test group assignment is randomly determined ○  Can not guarantee persistent experience if changing browser •  11% of Wix users use more than one desktop browser Keeping persistent UX

Slide 39

Slide 39 text

There is MORE than one

Slide 40

Slide 40 text

# of active experiment Possible # of states 10 1024 20 1,048,576 30 1,073,741,824 Possible states >= 2^(# experiments) Wix has ~200 active experiments = 1.606938e+60

Slide 41

Slide 41 text

A/B testing introduces complexity

Slide 42

Slide 42 text

•  Override options (URL parameters, cookies, headers…) •  Near real time user BI tools •  Integrated developer tools in the product Support tools   

Slide 43

Slide 43 text

Define Code Experiment Expand Merge code Close

Slide 44

Slide 44 text

•  Spec = Experiment template (in the code) ○  Define test groups ○  Mandatory limitations (filters, user types) ○  Scope = Group of related experiments (usually by product) •  Why is it needed ○  Type safety ○  Preventing human errors (typos, user types) ○  Controlled by the developer (developer knows about the context) ○  Conducting experiments in batch Define spec

Slide 45

Slide 45 text

public class ExampleSpecDefinition extends SpecDefinition { @Override protected ExperimentSpecBuilder customize(ExperimentSpecBuilder builder) { return builder .withOwner("OWNERS_EMAIL_ADDRESS") .withScopes(aScopeDefinitionForAllUserTypes( "SOME_SCOPE")) .withTestGroups(asList("Group A", "Group B")); } } Spec code snippet

Slide 46

Slide 46 text

•  Experiment = “If” statement in the code Conducting experiment final String result = laboratory.conductExperiment(key, fallback, new StringConverter()); if (result.equals("group a")) // execute group a's logic else if (result.equals("group b")) // execute group b's logic // in case conducting the experiment failed - the fallback value is returned // in this case you would usually execute the 'old' logic

Slide 47

Slide 47 text

•  Upload the specs to Petri server ○  Enables to deﬁne an experiment instance Upload spec { "creationDate" : "2014-01-09T13:11:26.846Z", "updateDate" : "2014-01-09T13:11:26.846Z", "scopes" : [ { "name" : "html-editor", "onlyForLoggedInUsers" : true }, { "name" : "html-viewer", "onlyForLoggedInUsers" : false } ], "testGroups" : [ "old", "new" ], "persistent" : true, "key" : "clientExperimentFullFlow1", "owner" : "" }

Slide 48

Slide 48 text

Start new experiment (limited population)

Slide 49

Slide 49 text

Manage experiment states

Slide 50

Slide 50 text

1.  Convert A/B Test to Feature Toggle (100% ON) 2.  Merge the code 3.  Close the experiment 4.  Remove experiment instance Ending successful experiment

Slide 51

Slide 51 text

• Deﬁne spec • Use Petri client to conduct experiment in the code (defaults to old) • Sync spec • Open experiment • Manage experiment state • End experiment Experiment lifecycle

Slide 52

Slide 52 text

Petri is more than just an A/B test framework Feature toggle A/B Test Personalization Internal testing Continuous deployment Jira integration Experiments Dynamic conﬁguration QA Automated testing

Slide 53

Slide 53 text

•  Expose features internally to company employees •  Enable continuous deployment with feature toggles •  Select assignment by sites (not only by users) •  Automatic selection of winning group* •  Exposing feature to #n of users* •  Integration with Jira * Planned feature Other things we (will) do with Petri

Slide 54

Slide 54 text

Petri is now an open source project https://github.com/wix/petri

Slide 55

Slide 55 text

Q&A Aviran Mordo Head of Back-end Engineering @aviranm www.linkedin.com/in/aviran www.aviransplace.com https://github.com/wix/petri http://goo.gl/L7pHnd Sagy Rozman Back-end Guild master www.linkedin.com/in/sagyrozman @sagyrozman

Slide 56

Slide 56 text

Credits http://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg http://goo.gl/nEiepT https://www.flickr.com/photos/ilo_oli/2421536836 https://www.flickr.com/photos/dexxus/5791228117 http://goo.gl/SdeJ0o https://www.flickr.com/photos/112923805@N05/15005456062 https://www.flickr.com/photos/wiertz/8537791164 https://www.flickr.com/photos/laenulfean/5943132296 https://www.flickr.com/photos/torek/3470257377 https://www.flickr.com/photos/i5design/5393934753 https://www.flickr.com/photos/argonavigo/5320119828

Slide 57

Slide 57 text

•  Modeled experiment lifecycle •  Open source (developed using TDD from day 1) •  Running at scale on production •  No deployment necessary •  Both back-end and front-end experiment •  Flexible architecture Why Petri

Slide 58

Slide 58 text

PERTI Server Your app Laboratory DB Logs