Overlapping Experim - Speaker Deck

Slide 1

Slide 1 text

Overlapping Experiment Infrastructure @nid90, @sriharisriraman @nilenso

Slide 2

Slide 2 text

Sections • About the paper, and the problem statement • Basic entities and terms in Experimentation • Domains, Layers and Experiments • Diversion types, Conditions • Launch Layers • Logic ﬂow for a request • An alternate approach

Slide 3

Slide 3 text

About the paper • 2010 • KDD’10, July 25–28, 2010, Washington, DC, USA • http://research.google.com/pubs/pub36500.html • Used in google since ~2007 • Previous works (papers) do not cover scaling an experiment infrastructure and the overall experimentation environment to support running more experiments more quickly and more robustly

Slide 4

Slide 4 text

Why • Traffic is precious; Traffic is not infinite (even for Google) • Not 100s or 1000s of sessions; probably millions • Difficult to get statistically significant results in a reasonable timeframe • Multiple, Multivariate, Fast, Ramp-ups

Slide 5

Slide 5 text

Terminology

Slide 6

Slide 6 text

Basic entities and terms • Treatment / Bucket • Controlled experiments • A/B testing • Website Testing • MultiVariable Testing • Multifactorial testing

Slide 7

Slide 7 text

Parameters • Parameters are variables with a set of possible values • Examples: • search algorithm • ad background color • autocomplete on/off • number of results in search page

Slide 8

Slide 8 text

Two extremes Request belongs to at most one experiment Request belongs to N simultaneous experiments Single Layer Multi Factorial

Slide 9

Slide 9 text

The cookie-mod-bucket “binaries” flow

Slide 10

Slide 10 text

Two extremes Traffic is split, meaning we need more time to attain statistical significance Causes downstream binary starvation Treatments might conflict - blue text on blue background Parameters change often, difficult to change one experiment without affecting another

Slide 11

Slide 11 text

Domains, Layers, and Experiments The building blocks

Slide 12

Slide 12 text

Domains Layers Models splitting of trafﬁc Models grouping of parameters

Slide 13

Slide 13 text

Domains contain Layers

Slide 14

Slide 14 text

Domains contain Layers We can change a lot of parameters in D1 In D2, we change parameters more clinically, by grouping them in layers

Slide 15

Slide 15 text

Layers contain Domains This allows for further division of trafﬁc and enables allotting an experiment only the portion of trafﬁc that is relevant to it

Slide 16

Slide 16 text

Layers contain Experiments “Experiment” in google’s terminology is what a “Treatment” in statistical literature. Trafﬁc is split between treatments.

Slide 17

Slide 17 text

Diversions and Conditions

Slide 18

Slide 18 text

Diversion `f` needs to be pure for stickiness reasons i.e, Given the same cookie and layer-id, it will result in the same behavior every time

Slide 19

Slide 19 text

Diversion types Random Cookie mod Cookie day User ID User-ID > Cookie mod > Cookie day > Random | roll a die | cookie is random => cookie % 1000 is random | cookie + day % 1000 | higher level of stickiness

Slide 20

Slide 20 text

Conditions Domain based allocation after diversion

Slide 21

Slide 21 text

Conditions Test (“Canary”) new code with live trafﬁc, and ramp up to a full release based on datacenters or machines

Slide 22

Slide 22 text

Logic flow

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Alternate Approach

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Other important aspects • Education – it is easy to design an experiment wrong • Coverage – when users are actually part of the treatment • Scheduling – scheduling within the nested blocks • Conﬁdence Level, Conﬁdence Intervals, A/A tests • Reporting, Accuracy • Testing

Slide 28

Slide 28 text

Great Material • Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Google) • A/B testing @ Internet Scale (LinkedIn, Bing, Google) • Controlled experiments on the web: survey and practical guide • D. Cox and N. Reid. The theory of the design of experiments, 2000 • Netﬂix Experimentation Platform • Online Experimentation at Microsoft • Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO (Microsoft)