Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
360
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
530
How RSpec Works
penelope_zone
0
6.4k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
75
Teaching RSpec to play nice with Rails
penelope_zone
2
120
Little machines that eat strings
penelope_zone
1
84
What is processor (brighton ruby edition)
penelope_zone
0
94
What is processor?
penelope_zone
1
340
extremely defensive coding - rubyconf edition
penelope_zone
0
250
Agile, etc.
penelope_zone
2
210
Other Decks in Technology
See All in Technology
Lexical Analysis
shigashiyama
1
150
ノーコードデータ分析ツールで体験する時系列データ分析超入門
negi111111
0
410
DMARC 対応の話 - MIXI CTO オフィスアワー #04
bbqallstars
1
160
SREによる隣接領域への越境とその先の信頼性
shonansurvivors
2
520
ハイパーパラメータチューニングって何をしているの
toridori_dev
0
140
複雑なState管理からの脱却
sansantech
PRO
1
140
The Rise of LLMOps
asei
7
1.4k
Oracle Cloud Infrastructureデータベース・クラウド:各バージョンのサポート期間
oracle4engineer
PRO
28
12k
Security-JAWS【第35回】勉強会クラウドにおけるマルウェアやコンテンツ改ざんへの対策
4su_para
0
180
TanStack Routerに移行するのかい しないのかい、どっちなんだい! / Are you going to migrate to TanStack Router or not? Which one is it?
kaminashi
0
580
誰も全体を知らない ~ ロールの垣根を超えて引き上げる開発生産性 / Boosting Development Productivity Across Roles
kakehashi
1
220
CysharpのOSS群から見るModern C#の現在地
neuecc
2
3.2k
Featured
See All Featured
Build The Right Thing And Hit Your Dates
maggiecrowley
33
2.4k
Optimising Largest Contentful Paint
csswizardry
33
2.9k
Rails Girls Zürich Keynote
gr2m
94
13k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
364
24k
Building Your Own Lightsaber
phodgson
103
6.1k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
48k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
93
16k
Docker and Python
trallard
40
3.1k
Unsuck your backbone
ammeep
668
57k
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen