A/B Testing Got You Elected Mister President

A/B Testing Got you elected Mister President

@samphippen @samphippen 

Should I make this change?

Users A group: 50% B group: 50% Site change Old
site

Measure some metric

Do maths on the two groups

Profit

Lemme show you my favourite A/B test

Also some videos

+$60 million

Protips

Same user always sees same version

Caching

Roughly same performance

Also for feature flagging

A super lightning fast guide on how to do it
and what it looks like

gem 'split'

require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new

<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>

What it looks like

https://github.com/ andrew/split

How to interpret the results

Stats time

Confidence Value

P =0.95 is used in medical trials

Common mistake: Assumption of normality

This will probably work for you

How to design the experiment

Step 1: clearly state your hypothesis

Example: I will get more donations if our button is
jimmy wale’s face

Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face

Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face

Step 2: Pick a statistical test

Example: difference of proportions (the standard A/b test)

http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx

Step 3: Decide an experiment length (number of days)

Example: we get 200 hits a day, let’s test for
15 days for 3000 hits

Alternatively: A fixed sample size Stop after 10000 users

Step 4: Split

Half the users get jimmy wales face half the users
get whatever the button was before

Step 5: inspect results and analyse

Let’s talk about analysis

Let’s work two examples (one null, one positive)

With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18

Confidence = 93.6% Too low at 95% to conclude that
this is better

common mistake: Sample size

With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180

99.9% confidence High enough for us to declare this better

Confounding factors ARE bad

this is hard stuff I hope you understood :) ask
me questions @samphippen

A/B Testing Got You Elected Mister President

A/B Testing Got You Elected Mister President

More Decks by Penelope Phippen

Other Decks in Technology

Featured

Transcript