Penelope Phippen
April 06, 2013
350

# A/B Testing Got You Elected Mister President

April 06, 2013

## Transcript

1. A/B Testing
Got you elected
Mister President

2. @samphippen
@samphippen

3. Should I make
this change?

4. Users
A group: 50% B group: 50%
Site change
Old site

5. Measure some metric

6. Do maths on
the two
groups

7. ???

8. Profit

9. Lemme show you my
favourite A/B test

10. Also some videos

11. +\$60 million

12. Protips

13. Same user always sees
same version

14. Caching

15. Roughly same
performance

16. Also for feature flagging

17. A super lightning fast
guide on how to do it and
what it looks like

18. gem 'split'

19. require 'split/dashboard'
run Rack::URLMap.new \
"/" => YourApp::Application,
"/split" => Split::Dashboard.new

20. <% ab_test("experiment_name", "a", "b") do |c| %>

Get points?

<% end %>

21. What it looks like

22. https://github.com/
andrew/split

23. How to interpret the
results

24. Stats time

25. Confidence
Value

26. P =0.95
is used in medical
trials

27. Common mistake:
Assumption of normality

28. This will probably work
for you

29. How to design the
experiment

30. Step 1: clearly
state your
hypothesis

31. Example:
I will get more donations
if our button is jimmy
wale’s face

32. Formally:
Null Hypothesis: there
will be no increase in
donations if we use
jimmy wales face

33. Formally:
positive Hypothesis: there
will be an increase in
donations if we use
jimmy wales face

34. Step 2: Pick a statistical
test

35. Example: difference of
proportions (the
standard A/b test)

36. http://stattrek.com/
hypothesis-test/
difference-in-
proportions.aspx

37. Step 3: Decide an
experiment length
(number of days)

38. Example: we get 200 hits
a day, let’s test for 15
days for 3000 hits

39. Alternatively: A fixed
sample size
Stop after 10000 users

40. Step 4: Split

41. Half the users get jimmy
wales face
half the users get
whatever the button
was before

42. Step 5: inspect results
and analyse

43. Let’s talk about analysis

44. Let’s work two examples
(one null, one positive)

45. With
jimmy
Without
Jimmy
Users in
test 100 100
Users
that
clicked
27 18

46. Confidence = 93.6%
Too low at 95% to
conclude that this is
better

47. common mistake:
Sample size

48. With
jimmy
Without
Jimmy
Users in
test 1000 1000
Users
that
clicked
270 180

49. 99.9% confidence
High enough for us to
declare this better

50. Confounding
factors ARE
bad

51. this is hard stuff
I hope you understood :)
ask me questions
@samphippen