Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
360
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
560
How RSpec Works
penelope_zone
0
6.6k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
82
Teaching RSpec to play nice with Rails
penelope_zone
2
140
Little machines that eat strings
penelope_zone
1
99
What is processor (brighton ruby edition)
penelope_zone
0
110
What is processor?
penelope_zone
1
350
extremely defensive coding - rubyconf edition
penelope_zone
0
260
Agile, etc.
penelope_zone
2
220
Other Decks in Technology
See All in Technology
claude codeでPrompt Engineering
iori0311
0
530
激動の時代、新卒エンジニアはAIツールにどう向き合うか。 [LayerX Bet AI Day Countdown LT Day1 ツールの選択]
tak848
0
610
今日からあなたもGeminiを好きになる
subaruhello
1
650
MCPに潜むセキュリティリスクを考えてみる
milix_m
1
880
東京海上日動におけるセキュアな開発プロセスの取り組み
miyabit
0
200
【CEDEC2025】LLMを活用したゲーム開発支援と、生成AIの利活用を進める組織的な取り組み
cygames
PRO
1
1.6k
Power Automate のパフォーマンス改善レシピ / Power Automate Performance Improvement Recipes
karamem0
0
270
Jitera Company Deck / JP
jitera
0
250
増え続ける脆弱性に立ち向かう: 事前対策と優先度づけによる 持続可能な脆弱性管理 / Confronting the Rise of Vulnerabilities: Sustainable Management Through Proactive Measures and Prioritization
nttcom
1
210
Ktor + Google Cloud Tasks/PubSub におけるOTel Messaging計装の実践
sansantech
PRO
1
330
株式会社島津製作所_研究開発(集団協業と知的生産)の現場を支える、OSS知識基盤システムの導入
akahane92
1
1.3k
ObsidianをLLM時代のナレッジベースに! クリッピング→Markdown→CLI連携の実践
srvhat09
7
9.8k
Featured
See All Featured
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
Rails Girls Zürich Keynote
gr2m
95
14k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Measuring & Analyzing Core Web Vitals
bluesmoon
7
530
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Raft: Consensus for Rubyists
vanstee
140
7k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.8k
BBQ
matthewcrist
89
9.8k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
760
Reflections from 52 weeks, 52 projects
jeffersonlam
351
21k
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen