$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
360
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
580
How RSpec Works
penelope_zone
0
6.7k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
93
Teaching RSpec to play nice with Rails
penelope_zone
2
150
Little machines that eat strings
penelope_zone
1
110
What is processor (brighton ruby edition)
penelope_zone
0
120
What is processor?
penelope_zone
1
370
extremely defensive coding - rubyconf edition
penelope_zone
0
270
Agile, etc.
penelope_zone
2
230
Other Decks in Technology
See All in Technology
翻訳・対話・越境で強いチームワークを作ろう! / Building Strong Teamwork through Interpretation, Dialogue, and Border-Crossing
ar_tama
4
1.6k
Design System Documentation Tooling 2025
takanorip
1
930
pmconf2025 - 他社事例を"自社仕様化"する技術_iRAFT法
daichi_yamashita
0
540
AI (LLM) を活用する上で必須級のMCPをAmazon Q Developerで学ぼう / 20251127 Ikuma Yamashita
shift_evolve
PRO
2
100
“決まらない”NSM設計への処方箋 〜ビットキーにおける現実的な指標デザイン事例〜 / A Prescription for "Stuck" NSM Design: Bitkey’s Practical Case Study
bitkey
PRO
1
360
一億総業務改善を支える社内AIエージェント基盤の要諦
yukukotani
8
2.8k
命名から始めるSpec Driven
kuruwic
3
830
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
37k
Product Engineer
resilire
0
140
GitLab Duo Agent Platformで実現する“AI駆動・継続的サービス開発”と最新情報のアップデート
jeffi7
0
160
Docker, Infraestructuras seguras y Hardening
josejuansanchez
0
150
ページの可視領域を算出する方法について整理する
yamatai1212
0
160
Featured
See All Featured
How GitHub (no longer) Works
holman
316
140k
Designing Experiences People Love
moore
142
24k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
196
69k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.8k
Unsuck your backbone
ammeep
671
58k
GitHub's CSS Performance
jonrohan
1032
470k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.2k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.4k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen