Slide 1

Slide 1 text

MARKETS, MECHANISMS, MACHINES University of Virginia, Spring 2019 Class 6: Experiments 31 January 2019 cs4501/econ4559 Spring 2019 David Evans and Denis Nekipelov https://uvammm.github.io

Slide 2

Slide 2 text

Project Grading 1 Excellent job – met our expectations for this project and got what we hoped you would out of it. Reasonable – missed some things we hoped you would get, but a good effort and got most of what we wanted out of this. Some serious problems – seem to be missing key ideas, or sign of unacceptable effort; didn’t get what we think you should out of this. Positive modifiers:

Slide 3

Slide 3 text

Unbounded Scale 2 Exceptional! Better than we thought possible! Breakthrough result, should be published in a top venue Worthy of a Turing Award/Nobel Prize

Slide 4

Slide 4 text

Project 2 Due Tuesday More opportunities for … than Project 1 Projects will get more and more open-ended, until the Final Project where you will be able to select your own problem 3

Slide 5

Slide 5 text

Experiments 4

Slide 6

Slide 6 text

5 Current citation rates suggest that I am among the 10 scientists worldwide who are currently the most commonly cited, perhaps also the currently most-cited physician. This probably only proves that citation metrics are highly unreliable, since I estimate that I have been rejected over 1,000 times in my life. Regardless, I consider myself privileged to have learned and to continue to learn from interactions with students and young scientists (of all ages) from all over the world and I love to be constantly reminded that I know next to nothing. PLOS Medicine, 2005

Slide 7

Slide 7 text

Why Most Research Findings Are False 6 ! = true relationships not true relationships Genome study, 100 000 markers ~10 associated with disease ! ≈ 1023

Slide 8

Slide 8 text

Why Most Research Findings Are False 7 Genome study, 100 000 markers ~10 associated with disease ! ≈ 10%& ! = true relationships not true relationships

Slide 9

Slide 9 text

Study Outcomes 8 Pick true relationship, probability study finds it true: 1 − # “Type 2 error rate” “false negative” Pick false relationship, probability study finds it true: $ “Type 1 error rate” “false positive”

Slide 10

Slide 10 text

Positive Predictive Value !!" = number of true positives total number of positive outcomes 9 ! 4 experiment 6inds 4) =

Slide 11

Slide 11 text

10

Slide 12

Slide 12 text

Why is ! = 0.05? 11

Slide 13

Slide 13 text

Why is ! = 0.05? 12

Slide 14

Slide 14 text

13 https://xkcd.com/882/ https://xkcd.com/1478/

Slide 15

Slide 15 text

Mostly False Results in Practice 14

Slide 16

Slide 16 text

15

Slide 17

Slide 17 text

its not just in medicine... 16

Slide 18

Slide 18 text

its not just in medicine... 17 Thomas Herndon, UMass student who attempted to replicate for Econ class assignment

Slide 19

Slide 19 text

Web Experiments 18 controlled experiments randomized experiments A/B tests split tests Control/Treatment …

Slide 20

Slide 20 text

Widespread Use and Value 19

Slide 21

Slide 21 text

20

Slide 22

Slide 22 text

21

Slide 23

Slide 23 text

A/B Test 22 Population of Users Splitter All incoming requests Control (Existing System) Treatment (Modified Behavior) Logging Logging Analyzer Log A Log B

Slide 24

Slide 24 text

Analyzing the Logs 23 Splitter Control (Existing System) Treatment (Modified Behavior) Logging Logging Analyzer Log A Log B Overall Evaluation Criterion (OEC) quantitative measure of goal Response Evaluation Metric

Slide 25

Slide 25 text

Terminology 24 Splitter Control (Existing System) Treatment (Modified Behavior) Logging Logging Analyzer Log A Log B Factor: variable controlled by experiment Experimental Unit: entity over which metrics are calculated

Slide 26

Slide 26 text

Tracking Users 25 HTTP is Stateless Client Server HTTP GET / HTTP/1.1 HTTP GET /syllabus HTTP/1.1

Slide 27

Slide 27 text

Cookies 26 HTTP is Stateless Client Server HTTP GET / HTTP/1.1 HTTP GET / HTTP/1.1

Slide 28

Slide 28 text

Opening your Cookie Jar 27 chrome://settings/siteData?search=cookies

Slide 29

Slide 29 text

28 Firefox: Tools | Web Developer | Storage Inspector

Slide 30

Slide 30 text

Terminology 29 Splitter Control (Existing System) Treatment (Modified Behavior) Logging Logging Analyzer Log A Log B Factor: variable controlled by experiment Experimental Unit: entity over which metrics are calculated user (browser client)

Slide 31

Slide 31 text

30

Slide 32

Slide 32 text

31

Slide 33

Slide 33 text

32 Challenge: find the non-ad content!

Slide 34

Slide 34 text

How Big an Experiment? 33

Slide 35

Slide 35 text

How Big an Experiment? Strategy 1: How much can you spend? 34 ! = Total budget − .ixed costs cost per data point

Slide 36

Slide 36 text

How Big an Experiment? Strategy 2: Needed statistical power 35 Probability of rejecting false null hypothesis: if treatment causes a difference in OEC, probability it will be detected

Slide 37

Slide 37 text

36

Slide 38

Slide 38 text

37

Slide 39

Slide 39 text

38 ! = 2 $%&'/) + $%&+ ) ,- − ,% / 0 = 0.05 $%&'/) = 1.9599 … 7 = 0.20 $%&+ = 0.84... 2 $%&'/) + $%&+ ) = 15.86 ≈ 16

Slide 40

Slide 40 text

Minimum Sample Size 39 ! = 16 Δ& Δ = sensitivity relative to standard deviation = 34 − 35 6 van Belle’s “Rule of Thumb”

Slide 41

Slide 41 text

Example 5% of visitors in experimental population purchase average purchase = $100, standard deviation = $20. OEC: revenue 40 based on Kohavi et al., Controlled experiments on the web: survey and practical guide. 2008. How many users do we need for experiment to detect 10% change in revenue?

Slide 42

Slide 42 text

Example 5% of visitors in experimental population purchase average purchase = $100, standard deviation = $40. OEC: revenue average spending = 0.05 $ $100 + 0.95 $ $0 = $5.00 Δ = sensitivity/std dev = 0.1 $ $+ $,- = 0.0125 41 based on Kohavi et al., Controlled experiments on the web: survey and practical guide. 2008. How many users do we need for experiment to detect 10% change in revenue? / = 16 Δ1 = 102,400 / = 16 (0.01 $ $5/$40)1 = 10.24M Detect 10% change with 8 = 0.05, 9 = 0.20 Detect 1% change with 8 = 0.05, 9 = 0.20

Slide 43

Slide 43 text

Example 5% of visitors in experimental population purchase average purchase = $100, standard deviation = $40. OEC: revenue conversion rate assume conversion modeled as Bernoulli trial: standard dev = ! 1 − ! = 0.22 for ! = 0.05 Δ = sensitivity/std dev = 0.1 - 0.22 = 0.022 42 based on Kohavi et al., Controlled experiments on the web: survey and practical guide. 2008. How many users do we need for experiment to detect 10% change in conversion rate? . = 16 Δ0 = 33,057 . = 16 Δ0 = 102,400

Slide 44

Slide 44 text

A/A Test 43 Population of Users Splitter All incoming requests Control (Existing System) Logging Logging Analyzer Log A Log A’ Control (Existing System)

Slide 45

Slide 45 text

Multi-Variable Tests (MVT) Test multiple factors in one experiments: - More efficient – test many factors at once on same population - Estimate interactions between factors 44 F1 good F2 no effect F1 + F2 bad

Slide 46

Slide 46 text

“Full Factorial” MVT 45 Splitter All incoming requests K factors

Slide 47

Slide 47 text

“Fractional Factorial” MVT Plackett-Burman Design: each pair of factors appears same number of times 46 Group F1 F2 F3 1 1 1 1 2 0 1 1 3 1 0 1 4 1 1 0

Slide 48

Slide 48 text

Experiments Gone Awry 47

Slide 49

Slide 49 text

48

Slide 50

Slide 50 text

49 ! = 16 Δ& Δ = 16/689003 = 0.0048

Slide 51

Slide 51 text

50

Slide 52

Slide 52 text

51

Slide 53

Slide 53 text

52

Slide 54

Slide 54 text

53

Slide 55

Slide 55 text

54

Slide 56

Slide 56 text

55

Slide 57

Slide 57 text

Charge Project 2 is due Tuesday (Feb 5) Next week: resource allocation, stable matching 56 B2B Marketers Should Stop A/B Testing in 2018