Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A/B Testing Got You Elected Mister President
Search
Penelope Phippen
April 06, 2013
Technology
1
350
A/B Testing Got You Elected Mister President
Penelope Phippen
April 06, 2013
Tweet
Share
More Decks by Penelope Phippen
See All by Penelope Phippen
Introducing Rubyfmt
penelope_zone
0
510
How RSpec Works
penelope_zone
0
6.3k
Quick and easy browser testing using RSpec and Rails 5.1
penelope_zone
1
75
Teaching RSpec to play nice with Rails
penelope_zone
2
120
Little machines that eat strings
penelope_zone
1
80
What is processor (brighton ruby edition)
penelope_zone
0
94
What is processor?
penelope_zone
1
340
extremely defensive coding - rubyconf edition
penelope_zone
0
240
Agile, etc.
penelope_zone
2
210
Other Decks in Technology
See All in Technology
JBUG岡山 #6 WordCamp男木島の チームビルディング
takeshifurusato
0
150
フルリモートワークはエンジニアの夢を叶えたか? #cm_odyssey
mamohacy
2
600
RAGのサービスをリリースして1年3ヶ月が経ちました
segavvy
4
920
Scaling Technical Excellence at 104: Evolution in AWS and Developer Empowerment
scotthsieh825
1
150
CEL(Common Expression Language)で書いた条件にマッチしたIAM Policyを見つける / iam-policy-finder
fujiwara3
0
710
E2Eテスト自動化プラットフォームにおけるAIの活用
shift_evolve
0
190
データベース研修 分析向けSQL入門【MIXI 24新卒技術研修】
mixi_engineers
PRO
0
110
DDDにおける認可の扱いとKotlinにおける実装パターン / authorization-for-ddd-and-kotlin-implement-pattern
urmot
4
390
簡単に始めるSnowflakeの機械学習
nayuts
1
190
スタートアップにおける組織設計とスクラムの長期戦略 / Scrum Fest Kanazawa 2024
yoshikiiida
13
3.6k
DevIO2024_レガシー運用からの脱却 -クラウド活用の実践事例とベストプラクティス-
jun2882
0
210
ABEMAにおけるLLMを用いたコンテンツベース推薦システム導入と効果検証
cyberagentdevelopers
PRO
1
730
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
155
14k
Learning to Love Humans: Emotional Interface Design
aarron
269
39k
Adopting Sorbet at Scale
ufuk
71
8.8k
Large-scale JavaScript Application Architecture
addyosmani
506
110k
Fantastic passwords and where to find them - at NoRuKo
philnash
42
2.7k
4 Signs Your Business is Dying
shpigford
178
21k
Why Our Code Smells
bkeepers
PRO
332
56k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
13
430
Understanding Cognitive Biases in Performance Measurement
bluesmoon
18
1.2k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
353
29k
The Cult of Friendly URLs
andyhume
75
5.9k
[RailsConf 2023] Rails as a piece of cake
palkan
35
4.4k
Transcript
A/B Testing Got you elected Mister President
@samphippen @samphippen
Should I make this change?
Users A group: 50% B group: 50% Site change Old
site
Measure some metric
Do maths on the two groups
???
Profit
Lemme show you my favourite A/B test
None
None
None
None
None
None
Also some videos
None
+$60 million
None
Protips
Same user always sees same version
Caching
Roughly same performance
Also for feature flagging
A super lightning fast guide on how to do it
and what it looks like
gem 'split'
require 'split/dashboard' run Rack::URLMap.new \ "/" => YourApp::Application, "/split" =>
Split::Dashboard.new
<% ab_test("experiment_name", "a", "b") do |c| %> <a href="/win" class="btn
<%= c %>"> Get points? </a> <% end %>
What it looks like
None
None
None
https://github.com/ andrew/split
How to interpret the results
Stats time
Confidence Value
P =0.95 is used in medical trials
Common mistake: Assumption of normality
None
This will probably work for you
How to design the experiment
Step 1: clearly state your hypothesis
Example: I will get more donations if our button is
jimmy wale’s face
Formally: Null Hypothesis: there will be no increase in donations
if we use jimmy wales face
Formally: positive Hypothesis: there will be an increase in donations
if we use jimmy wales face
Step 2: Pick a statistical test
Example: difference of proportions (the standard A/b test)
http://stattrek.com/ hypothesis-test/ difference-in- proportions.aspx
Step 3: Decide an experiment length (number of days)
Example: we get 200 hits a day, let’s test for
15 days for 3000 hits
Alternatively: A fixed sample size Stop after 10000 users
Step 4: Split
Half the users get jimmy wales face half the users
get whatever the button was before
Step 5: inspect results and analyse
Let’s talk about analysis
Let’s work two examples (one null, one positive)
With jimmy Without Jimmy Users in test 100 100 Users
that clicked 27 18
Confidence = 93.6% Too low at 95% to conclude that
this is better
common mistake: Sample size
With jimmy Without Jimmy Users in test 1000 1000 Users
that clicked 270 180
99.9% confidence High enough for us to declare this better
Confounding factors ARE bad
this is hard stuff I hope you understood :) ask
me questions @samphippen