Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
360
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
540
How RSpec Works
penelope_zone
0
6.5k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
77
Teaching RSpec to play nice with Rails
penelope_zone
2
120
Little machines that eat strings
penelope_zone
1
84
What is processor (brighton ruby edition)
penelope_zone
0
94
What is processor?
penelope_zone
1
340
extremely defensive coding - rubyconf edition
penelope_zone
0
250
Agile, etc.
penelope_zone
2
210
Other Decks in Technology
See All in Technology
The key to VCP-VCF
mirie_sd
0
160
UI State設計とテスト方針
rmakiyama
4
940
デジタルアイデンティティ人材育成推進ワーキンググループ 翻訳サブワーキンググループ 活動報告 / 20250114-OIDF-J-EduWG-TranslationSWG
oidfj
0
110
Visual StudioとかIDE関連小ネタ話
kosmosebi
1
280
テストを書かないためのテスト/ Tests for not writing tests
sinsoku
1
150
TSKaigi 2024 の登壇から広がったコミュニティ活動について
tsukuha
0
180
ハイテク休憩
sat
PRO
2
190
実践! ソフトウェアエンジニアリングの価値の計測 ── Effort、Output、Outcome、Impact
nomuson
0
1.3k
DevFest 2024 Incheon / Songdo - Compose UI 조합 심화
wisemuji
0
250
OPENLOGI Company Profile
hr01
0
57k
Bring Your Own Container: When Containers Turn the Key to EDR Bypass/byoc-avtokyo2024
tkmru
0
330
DUSt3R, MASt3R, MASt3R-SfM にみる3D基盤モデル
spatial_ai_network
3
500
Featured
See All Featured
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
132
33k
Building an army of robots
kneath
302
44k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.2k
The Power of CSS Pseudo Elements
geoffreycrofte
74
5.4k
Designing for Performance
lara
604
68k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
6.9k
Embracing the Ebb and Flow
colly
84
4.5k
A better future with KSS
kneath
238
17k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
6
490
Imperfection Machines: The Place of Print at Facebook
scottboms
266
13k
YesSQL, Process and Tooling at Scale
rocio
170
14k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
97
17k
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen