Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Class 6: Finding False Findings Fast and Furiously

David Evans
January 31, 2019

Class 6: Finding False Findings Fast and Furiously

Class 6: Finding False Findings Fast
https://uvammm.github.io/class6

Markets, Mechanisms, and Machines
University of Virginia
cs4501/econ4559 Spring 2019
David Evans and Denis Nekipelov
https://uvammm.github.io/

David Evans

January 31, 2019
Tweet

More Decks by David Evans

Other Decks in Science

Transcript

  1. MARKETS, MECHANISMS, MACHINES University of Virginia, Spring 2019 Class 6:

    Experiments 31 January 2019 cs4501/econ4559 Spring 2019 David Evans and Denis Nekipelov https://uvammm.github.io
  2. Project Grading 1 Excellent job – met our expectations for

    this project and got what we hoped you would out of it. Reasonable – missed some things we hoped you would get, but a good effort and got most of what we wanted out of this. Some serious problems – seem to be missing key ideas, or sign of unacceptable effort; didn’t get what we think you should out of this. Positive modifiers:
  3. Unbounded Scale 2 Exceptional! Better than we thought possible! Breakthrough

    result, should be published in a top venue Worthy of a Turing Award/Nobel Prize
  4. Project 2 Due Tuesday More opportunities for … than Project

    1 Projects will get more and more open-ended, until the Final Project where you will be able to select your own problem 3
  5. 5 Current citation rates suggest that I am among the

    10 scientists worldwide who are currently the most commonly cited, perhaps also the currently most-cited physician. This probably only proves that citation metrics are highly unreliable, since I estimate that I have been rejected over 1,000 times in my life. Regardless, I consider myself privileged to have learned and to continue to learn from interactions with students and young scientists (of all ages) from all over the world and I love to be constantly reminded that I know next to nothing. PLOS Medicine, 2005
  6. Why Most Research Findings Are False 6 ! = true

    relationships not true relationships Genome study, 100 000 markers ~10 associated with disease ! ≈ 1023
  7. Why Most Research Findings Are False 7 Genome study, 100

    000 markers ~10 associated with disease ! ≈ 10%& ! = true relationships not true relationships
  8. Study Outcomes 8 Pick true relationship, probability study finds it

    true: 1 − # “Type 2 error rate” “false negative” Pick false relationship, probability study finds it true: $ “Type 1 error rate” “false positive”
  9. Positive Predictive Value !!" = number of true positives total

    number of positive outcomes 9 ! 4 experiment 6inds 4) =
  10. 10

  11. 15

  12. its not just in medicine... 17 Thomas Herndon, UMass student

    who attempted to replicate for Econ class assignment
  13. 20

  14. 21

  15. A/B Test 22 Population of Users Splitter All incoming requests

    Control (Existing System) Treatment (Modified Behavior) Logging Logging Analyzer Log A Log B
  16. Analyzing the Logs 23 Splitter Control (Existing System) Treatment (Modified

    Behavior) Logging Logging Analyzer Log A Log B Overall Evaluation Criterion (OEC) quantitative measure of goal Response Evaluation Metric
  17. Terminology 24 Splitter Control (Existing System) Treatment (Modified Behavior) Logging

    Logging Analyzer Log A Log B Factor: variable controlled by experiment Experimental Unit: entity over which metrics are calculated
  18. Tracking Users 25 HTTP is Stateless Client Server HTTP GET

    / HTTP/1.1 <html ...> HTTP GET /syllabus HTTP/1.1 <html ...>
  19. Cookies 26 HTTP is Stateless Client Server HTTP GET /

    HTTP/1.1 <html ...> HTTP GET / HTTP/1.1 <html ...>
  20. Terminology 29 Splitter Control (Existing System) Treatment (Modified Behavior) Logging

    Logging Analyzer Log A Log B Factor: variable controlled by experiment Experimental Unit: entity over which metrics are calculated user (browser client)
  21. 30

  22. 31

  23. How Big an Experiment? Strategy 1: How much can you

    spend? 34 ! = Total budget − .ixed costs cost per data point
  24. How Big an Experiment? Strategy 2: Needed statistical power 35

    Probability of rejecting false null hypothesis: if treatment causes a difference in OEC, probability it will be detected
  25. 36

  26. 37

  27. 38 ! = 2 $%&'/) + $%&+ ) ,- −

    ,% / 0 = 0.05 $%&'/) = 1.9599 … 7 = 0.20 $%&+ = 0.84... 2 $%&'/) + $%&+ ) = 15.86 ≈ 16
  28. Minimum Sample Size 39 ! = 16 Δ& Δ =

    sensitivity relative to standard deviation = 34 − 35 6 van Belle’s “Rule of Thumb”
  29. Example 5% of visitors in experimental population purchase average purchase

    = $100, standard deviation = $20. OEC: revenue 40 based on Kohavi et al., Controlled experiments on the web: survey and practical guide. 2008. How many users do we need for experiment to detect 10% change in revenue?
  30. Example 5% of visitors in experimental population purchase average purchase

    = $100, standard deviation = $40. OEC: revenue average spending = 0.05 $ $100 + 0.95 $ $0 = $5.00 Δ = sensitivity/std dev = 0.1 $ $+ $,- = 0.0125 41 based on Kohavi et al., Controlled experiments on the web: survey and practical guide. 2008. How many users do we need for experiment to detect 10% change in revenue? / = 16 Δ1 = 102,400 / = 16 (0.01 $ $5/$40)1 = 10.24M Detect 10% change with 8 = 0.05, 9 = 0.20 Detect 1% change with 8 = 0.05, 9 = 0.20
  31. Example 5% of visitors in experimental population purchase average purchase

    = $100, standard deviation = $40. OEC: revenue conversion rate assume conversion modeled as Bernoulli trial: standard dev = ! 1 − ! = 0.22 for ! = 0.05 Δ = sensitivity/std dev = 0.1 - 0.22 = 0.022 42 based on Kohavi et al., Controlled experiments on the web: survey and practical guide. 2008. How many users do we need for experiment to detect 10% change in conversion rate? . = 16 Δ0 = 33,057 . = 16 Δ0 = 102,400
  32. A/A Test 43 Population of Users Splitter All incoming requests

    Control (Existing System) Logging Logging Analyzer Log A Log A’ Control (Existing System)
  33. Multi-Variable Tests (MVT) Test multiple factors in one experiments: -

    More efficient – test many factors at once on same population - Estimate interactions between factors 44 F1 good F2 no effect F1 + F2 bad
  34. “Fractional Factorial” MVT Plackett-Burman Design: each pair of factors appears

    same number of times 46 Group F1 F2 F3 1 1 1 1 2 0 1 1 3 1 0 1 4 1 1 0
  35. 48

  36. 50

  37. 51

  38. 52

  39. 53

  40. 54

  41. 55

  42. Charge Project 2 is due Tuesday (Feb 5) Next week:

    resource allocation, stable matching 56 B2B Marketers Should Stop A/B Testing in 2018