Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
370
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
600
How RSpec Works
penelope_zone
0
6.8k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
100
Teaching RSpec to play nice with Rails
penelope_zone
2
160
Little machines that eat strings
penelope_zone
1
120
What is processor (brighton ruby edition)
penelope_zone
0
140
What is processor?
penelope_zone
1
380
extremely defensive coding - rubyconf edition
penelope_zone
0
290
Agile, etc.
penelope_zone
2
250
Other Decks in Technology
See All in Technology
大規模ECサイトのあるバッチのパフォーマンスを改善するために僕たちのチームがしてきたこと
panda_program
1
390
Copilot 宇宙へ 〜生成AIで「専門データの壁」を壊す方法〜
nakasho
0
180
【社内勉強会】新年度からコーディングエージェントを使いこなす - 構造と制約で引き出すClaude Codeの実践知
nwiizo
22
11k
「コントロールの三分法」で考える「コト」への向き合い方 / phperkaigi2026
blue_goheimochi
0
140
開発チームとQAエンジニアの新しい協業モデル -年末調整開発チームで実践する【QAリード施策】-
qa
0
260
GitHub Copilot CLI で Azure Portal to Bicep
tsubakimoto_s
0
180
How to install a gem
indirect
0
1.3k
スピンアウト講座02_ファイル管理
overflowinc
0
1.2k
_Architecture_Modernization_から学ぶ現状理解から設計への道のり.pdf
satohjohn
2
750
脳が溶けた話 / Melted Brain
keisuke69
1
970
Phase04_ターミナル基礎
overflowinc
0
2.2k
スピンアウト講座04_ルーティン処理
overflowinc
0
1.1k
Featured
See All Featured
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
430
30 Presentation Tips
portentint
PRO
1
260
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.4k
How to train your dragon (web standard)
notwaldorf
97
6.6k
Into the Great Unknown - MozCon
thekraken
40
2.3k
Leo the Paperboy
mayatellez
4
1.5k
WCS-LA-2024
lcolladotor
0
500
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
150
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
The agentic SEO stack - context over prompts
schlessera
0
710
Statistics for Hackers
jakevdp
799
230k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen