Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
370
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
590
How RSpec Works
penelope_zone
0
6.7k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
98
Teaching RSpec to play nice with Rails
penelope_zone
2
160
Little machines that eat strings
penelope_zone
1
110
What is processor (brighton ruby edition)
penelope_zone
0
130
What is processor?
penelope_zone
1
370
extremely defensive coding - rubyconf edition
penelope_zone
0
280
Agile, etc.
penelope_zone
2
240
Other Decks in Technology
See All in Technology
アラフォーおじさん、はじめてre:Inventに行く / A 40-Something Guy’s First re:Invent Adventure
kaminashi
0
210
Scrum Guide Expansion Pack が示す現代プロダクト開発への補完的視点
sonjin
0
130
AIBuildersDay_track_A_iidaxs
iidaxs
4
1.7k
さくらのクラウド開発ふりかえり2025
kazeburo
2
1.3k
M&Aで拡大し続けるGENDAのデータ活用を促すためのDatabricks権限管理 / AEON TECH HUB #22
genda
0
310
Keynoteから見るAWSの頭の中
nrinetcom
PRO
1
150
AIと融ける人間の冒険
pujisi
0
100
Oracle Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
1
810
AI駆動開発ライフサイクル(AI-DLC)の始め方
ryansbcho79
0
280
なぜ あなたはそんなに re:Invent に行くのか?
miu_crescent
PRO
0
240
戰略轉變:從建構 AI 代理人到發展可擴展的技能生態系統
appleboy
0
170
Qiita Bash アドカレ LT #1
okaru
0
130
Featured
See All Featured
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
180
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Why Our Code Smells
bkeepers
PRO
340
58k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
What does AI have to do with Human Rights?
axbom
PRO
0
1.9k
4 Signs Your Business is Dying
shpigford
187
22k
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
1.8k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
110
Deep Space Network (abreviated)
tonyrice
0
32
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
0
26
Test your architecture with Archunit
thirion
1
2.1k
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen