Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
360
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
560
How RSpec Works
penelope_zone
0
6.6k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
81
Teaching RSpec to play nice with Rails
penelope_zone
2
130
Little machines that eat strings
penelope_zone
1
98
What is processor (brighton ruby edition)
penelope_zone
0
110
What is processor?
penelope_zone
1
350
extremely defensive coding - rubyconf edition
penelope_zone
0
260
Agile, etc.
penelope_zone
2
220
Other Decks in Technology
See All in Technology
Yamla: Rustでつくるリアルタイム性を追求した機械学習基盤 / Yamla: A Rust-Based Machine Learning Platform Pursuing Real-Time Capabilities
lycorptech_jp
PRO
4
170
Lambda Web Adapterについて自分なりに理解してみた
smt7174
5
140
2025-06-26_Lightning_Talk_for_Lightning_Talks
_hashimo2
2
110
高速なプロダクト開発を実現、創業期から掲げるエンタープライズアーキテクチャ
kawauso
1
160
5min GuardDuty Extended Threat Detection EKS
takakuni
0
180
fukabori.fm 出張版: 売上高617億円と高稼働率を陰で支えた社内ツール開発のあれこれ話 / 20250704 Yoshimasa Iwase & Tomoo Morikawa
shift_evolve
PRO
1
110
Amazon S3標準/ S3 Tables/S3 Express One Zoneを使ったログ分析
shigeruoda
5
590
KubeCon + CloudNativeCon Japan 2025 Recap Opening & Choose Your Own Adventureシリーズまとめ
mmmatsuda
0
230
Core Audio tapを使ったリアルタイム音声処理のお話
yuta0306
0
150
Zephyr RTOSを使った開発コンペに参加した件
iotengineer22
0
130
生成AI時代の開発組織・技術・プロセス 〜 ログラスの挑戦と考察 〜
itohiro73
1
370
AIとともに進化するエンジニアリング / Engineering-Evolving-with-AI_final.pdf
lycorptech_jp
PRO
0
140
Featured
See All Featured
Navigating Team Friction
lara
187
15k
Git: the NoSQL Database
bkeepers
PRO
430
65k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.4k
Site-Speed That Sticks
csswizardry
10
670
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Designing for Performance
lara
609
69k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.5k
How to Ace a Technical Interview
jacobian
277
23k
Balancing Empowerment & Direction
lara
1
390
A designer walks into a library…
pauljervisheath
207
24k
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen