Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
370
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
590
How RSpec Works
penelope_zone
0
6.8k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
98
Teaching RSpec to play nice with Rails
penelope_zone
2
160
Little machines that eat strings
penelope_zone
1
120
What is processor (brighton ruby edition)
penelope_zone
0
130
What is processor?
penelope_zone
1
370
extremely defensive coding - rubyconf edition
penelope_zone
0
280
Agile, etc.
penelope_zone
2
240
Other Decks in Technology
See All in Technology
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
13k
生成AIを活用した音声文字起こしシステムの2つの構築パターンについて
miu_crescent
PRO
3
210
SREのプラクティスを用いた3領域同時 マネジメントへの挑戦 〜SRE・情シス・セキュリティを統合した チーム運営術〜
coconala_engineer
2
710
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
3k
量子クラウドサービスの裏側 〜Deep Dive into OQTOPUS〜
oqtopus
0
140
コンテナセキュリティの最新事情 ~ 2026年版 ~
kyohmizu
2
610
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
68k
~Everything as Codeを諦めない~ 後からCDK
mu7889yoon
3
460
10Xにおける品質保証活動の全体像と改善 #no_more_wait_for_test
nihonbuson
PRO
2
320
Context Engineeringの取り組み
nutslove
0
370
Oracle AI Database移行・アップグレード勉強会 - RAT活用編
oracle4engineer
PRO
0
100
CDKで始めるTypeScript開発のススメ
tsukuboshi
1
510
Featured
See All Featured
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
67
The Invisible Side of Design
smashingmag
302
51k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Optimising Largest Contentful Paint
csswizardry
37
3.6k
The agentic SEO stack - context over prompts
schlessera
0
640
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2k
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.4k
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
170
Building the Perfect Custom Keyboard
takai
2
690
Joys of Absence: A Defence of Solitary Play
codingconduct
1
290
HDC tutorial
michielstock
1
390
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
1
56
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen