Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
360
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
570
How RSpec Works
penelope_zone
0
6.7k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
86
Teaching RSpec to play nice with Rails
penelope_zone
2
150
Little machines that eat strings
penelope_zone
1
110
What is processor (brighton ruby edition)
penelope_zone
0
120
What is processor?
penelope_zone
1
360
extremely defensive coding - rubyconf edition
penelope_zone
0
270
Agile, etc.
penelope_zone
2
230
Other Decks in Technology
See All in Technology
Biz職でもDifyでできる! 「触らないAIワークフロー」を実現する方法
igarashikana
7
3.2k
NLPコロキウム20251022_超効率化への挑戦: LLM 1bit量子化のロードマップ
yumaichikawa
2
400
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.2k
初めてのDatabricks Apps開発
taka_aki
1
280
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
43k
Introdução a Service Mesh usando o Istio
aeciopires
1
280
GraphRAG グラフDBを使ったLLM生成(自作漫画DBを用いた具体例を用いて)
seaturt1e
1
120
Zephyr(RTOS)にEdge AIを組み込んでみた話
iotengineer22
1
320
オブザーバビリティと育てた ID管理・認証認可基盤の歩み / The Journey of an ID Management, Authentication, and Authorization Platform Nurtured with Observability
kaminashi
1
370
ソースを読む時の思考プロセスの例-MkDocs
sat
PRO
1
150
クラウドとリアルの融合により、製造業はどう変わるのか?〜クラスメソッドの製造業への取組と共に〜
hamadakoji
0
390
webpack依存からの脱却!快適フロントエンド開発をViteで実現する #vuefes
bengo4com
3
3.1k
Featured
See All Featured
It's Worth the Effort
3n
187
28k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.7k
Music & Morning Musume
bryan
46
6.9k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.1k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.7k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Product Roadmaps are Hard
iamctodd
PRO
55
11k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.6k
Git: the NoSQL Database
bkeepers
PRO
431
66k
Keith and Marios Guide to Fast Websites
keithpitt
411
23k
Building a Modern Day E-commerce SEO Strategy
aleyda
44
7.8k
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen