$30 off During Our Annual Pro Sale. View Details »

Statistical Rethinking 2023 - Lecture 02

Statistical Rethinking 2023 - Lecture 02

Richard McElreath

January 04, 2023
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. Statistical Rethinking
    2. e Garden of Forking Data
    2023

    View Slide

  2. What
    proportion of
    the surface is
    covered with
    water?

    View Slide

  3. How should we use the sample?
    How to produce a summary?
    How to represent uncertainty?

    View Slide

  4. Work ow
    (1) De ne generative model of the sample
    (2) De ne a speci c estimand
    (3) Design a statistical way to produce estimate
    (4) Test (3) using (1)
    (5) Analyze sample, summarize

    View Slide

  5. Generative model of the globe
    Begin conceptually: How do the variables in uence one
    another?
    OVNCFS PG HMPCF UPTTFT / ćJT JT DIPTFO CZ UIF FYQFSJNFOUFS
    OVNCFS PG XBUFS QPJOUT PCTFSWFE 8
    OVNCFS PG MBOE QPJOUT PCTFSWFE -
    EJBHSBN UIBU TIPXT UIFTF GPVS WBSJBCMFT BOE DPOOFDUT TPNF PG UIFN
    DBVTBM JOĘVFODF -FUT TUBSU XJUI B CMBOL EJBHSBN BOE BEE UIF BSSP
    Q
    /
    8
    -
    BEE TPNF BSSPXT "SSPXT JO UIFTF EJBHSBNT JOEJDBUF DBVTBM JOĘ
    proportion of water
    number of tosses
    water observations
    land observations

    View Slide

  6. BEE TPNF BSSPXT "SSPXT JO UIFTF EJBHSBNT JOEJDBUF DBVTBM JO
    BCPVU XIBU iDBVTBM JOĘVFODFw NFBOT IFSF JT UP JNBHJOF DIBOHJOH
    IJDI PUIFS WBSJBCMFT BMTP DIBOHF BT B DPOTFRVFODF 'PS FYBNQMF
    PG HMPCF UPTTFT / UIFO CPUI 8 BOE - NJHIU DIBOHF #VU Q XPV
    BOE - CVU OPU Q 8F ESBX UIBU MJLF UIJT
    Q
    /
    8
    -
    Generative model of the globe
    Begin conceptually: How do the variables in uence one
    another?
    N in uences W and L
    in uence
    in uence

    View Slide

  7. 4."-- 803-%4 "/% -"3(& 803-%4
    TU DIBOHJOH 8 BOE - EJSFDUMZ‰GPS FYBNQMF CZ NBOJQVMBUJOH UIF
    PO Q PS / #VU DIBOHJOH Q CZ GPS FYBNQMF FSBTJOH B SBOEPN DPO
    JOĘVFODF 8 BOE - BU MFBTU PO BWFSBHF 4P XF OFFE UXP NPSF BSSP
    Q
    /
    8
    -
    S DBVTBM EJBHSBN PG UIF HMPCF UPTTJOH FYQFSJNFOU ćFSF BSF TPN
    Generative model of the globe
    Begin conceptually: How do the variables in uence one
    another?

    View Slide

  8. Generative model of the globe
    Generative assumptions: What do the arrows mean exactly?
    W,L = f(p, N)
    4."-- 803-%4 "/% -"3(& 803-%4
    TU DIBOHJOH 8 BOE - EJSFDUMZ‰GPS FYBNQMF CZ NBOJQVMBUJOH UIF
    PO Q PS / #VU DIBOHJOH Q CZ GPS FYBNQMF FSBTJOH B SBOEPN DPO
    JOĘVFODF 8 BOE - BU MFBTU PO BWFSBHF 4P XF OFFE UXP NPSF BSSP
    Q
    /
    8
    -
    S DBVTBM EJBHSBN PG UIF HMPCF UPTTJOH FYQFSJNFOU ćFSF BSF TPN

    View Slide

  9. Work ow
    (1) De ne generative model of the sample
    (2) De ne a speci c estimand
    (3) Design a statistical way to produce estimate
    (4) Test (3) using (1)
    (5) Analyze sample, summarize

    View Slide

  10. Bayesian data analysis
    For each possible explanation of the sample,
    Count all the ways the sample could happen.
    Explanations with more ways to produce the
    sample are more plausible.

    View Slide

  11. e Garden of Forking Data
    El jardín de los datos que se bifurcan

    View Slide

  12. For each possible proportion
    of water on the globe,
    Count all the ways the sample
    of tosses could happen.
    Proportions with more ways
    to produce the sample are
    more plausible.

    View Slide

  13. A Four-sided Globe
    1 1
    2 3
    1
    4
    covered 25% by water

    View Slide

  14. Garden of Forking Data
    Observe:
    (1)
    (2)
    (3)
    (4)
    (5)
    Possible d4 globes:

    View Slide

  15. Garden of Forking Data
    Observe:
    (1)
    (2)
    (3)
    (4)
    (5)
    Possible d4 globes:
    25%

    View Slide

  16. First Possibility
    Figure 2.2

    View Slide

  17. Second Possibility
    Figure 2.2

    View Slide

  18. ird Possibility
    Figure 2.2

    View Slide

  19. Figure 2.2
    First Observation

    View Slide

  20. Figure 2.2
    Second Observation

    View Slide

  21. Figure 2.2
    ird Observation

    View Slide

  22. Figure 2.2
    3 Ways to see
    for 25% water

    View Slide

  23. Garden of Forking Data
    (1)
    (2)
    (3)
    (4)
    (5)
    Possible globes: Ways to produce
    ?
    3
    ?
    ?
    ?

    View Slide

  24. Garden of Forking Data
    (1)
    (2)
    (3)
    (4)
    (5)
    Possible globes: Ways to produce
    0
    3
    ?
    ?
    ?

    View Slide

  25. Garden of Forking Data
    (1)
    (2)
    (3)
    (4)
    (5)
    Possible globes: Ways to produce
    0
    3
    ?
    ?
    0

    View Slide

  26. (3)

    View Slide

  27. (4)

    View Slide

  28. Garden of Forking Data
    (1)
    (2)
    (3)
    (4)
    (5)
    Possible globes: Ways to produce
    0
    3
    8
    9
    0

    View Slide

  29. 'ĶĴłĿIJ ƊƋ "ęFS FMJNJOBUJOH QBUIT JODPOTJTUFOU XJUI UIF TFRVFODF 8-8
    POMZ PG UIF QBUIT SFNBJO
    VQ IPX NBOZ TFRVFODFT QBUIT UISPVHI UIF HBSEFO PG GPSLJOH EBUB DPVME QPUF
    UIF ĕSTU UISFF PCTFSWFE TBNQMFT
    1PTTJCJMJUZ 8BZT UP QSPEVDF
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    IBU UIF OVNCFS PG XBZT UP QSPEVDF UIF EBUB GPS FBDI QPTTJCJJMUZ DBO CF DPN
    Counts to plausibility
    Unglamorous basis of applied probability:
    ings that can happen more ways are more plausible.

    View Slide

  30. 'ĶĴłĿIJ ƊƋ "ęFS FMJNJOBUJOH QBUIT JODPOTJTUFOU XJUI UIF TFRVFODF 8-8
    POMZ PG UIF QBUIT SFNBJO
    VQ IPX NBOZ TFRVFODFT QBUIT UISPVHI UIF HBSEFO PG GPSLJOH EBUB DPVME QPUF
    UIF ĕSTU UISFF PCTFSWFE TBNQMFT
    1PTTJCJMJUZ 8BZT UP QSPEVDF
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    IBU UIF OVNCFS PG XBZT UP QSPEVDF UIF EBUB GPS FBDI QPTTJCJJMUZ DBO CF DPN
    Counts to plausibility
    Unglamorous basis of applied probability:
    ings that can happen more ways are more plausible.

    View Slide

  31. 'ĶĴłĿIJ ƊƋ "ęFS FMJNJOBUJOH QBUIT JODPOTJTUFOU XJUI UIF TFRVFODF 8-8
    POMZ PG UIF QBUIT SFNBJO
    VQ IPX NBOZ TFRVFODFT QBUIT UISPVHI UIF HBSEFO PG GPSLJOH EBUB DPVME QPUF
    UIF ĕSTU UISFF PCTFSWFE TBNQMFT
    1PTTJCJMJUZ 8BZT UP QSPEVDF
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    IBU UIF OVNCFS PG XBZT UP QSPEVDF UIF EBUB GPS FBDI QPTTJCJJMUZ DBO CF DPN
    Counts to plausibility
    Unglamorous basis of applied probability:
    ings that can happen more ways are more plausible.

    View Slide

  32. 'ĶĴłĿIJ ƊƋ "ęFS FMJNJOBUJOH QBUIT JODPOTJTUFOU XJUI UIF TFRVFODF 8-8
    POMZ PG UIF QBUIT SFNBJO
    VQ IPX NBOZ TFRVFODFT QBUIT UISPVHI UIF HBSEFO PG GPSLJOH EBUB DPVME QPUF
    UIF ĕSTU UISFF PCTFSWFE TBNQMFT
    1PTTJCJMJUZ 8BZT UP QSPEVDF
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    IBU UIF OVNCFS PG XBZT UP QSPEVDF UIF EBUB GPS FBDI QPTTJCJJMUZ DBO CF DPN
    Counts to plausibility
    Unglamorous basis of applied probability:
    ings that can happen more ways are more plausible.

    View Slide

  33. 'ĶĴłĿIJ ƊƋ "ęFS FMJNJOBUJOH QBUIT JODPOTJTUFOU XJUI UIF TFRVFODF 8-8
    POMZ PG UIF QBUIT SFNBJO
    VQ IPX NBOZ TFRVFODFT QBUIT UISPVHI UIF HBSEFO PG GPSLJOH EBUB DPVME QPUF
    UIF ĕSTU UISFF PCTFSWFE TBNQMFT
    1PTTJCJMJUZ 8BZT UP QSPEVDF
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    IBU UIF OVNCFS PG XBZT UP QSPEVDF UIF EBUB GPS FBDI QPTTJCJJMUZ DBO CF DPN
    Counts to plausibility
    Unglamorous basis of applied probability:
    ings that can happen more ways are more plausible.

    View Slide

  34. TFF XIBU IBQQFOT )FSFT UIF TBNQMF BHBJO BT B SFNJOEFS
    8 - 8 8 8 - 8 - 8
    ćF GPVSUI PCTFSWBUJPO JT 8 5P VQEBUF PVS QSFWJPVT DPVOUT GPS FBDI QPTTJCJMJUZ XF KVTU
    OFFE UP NVMUJQMZ CZ UIF BQQSPQSJBUF OVNCFS PG XBZT UP TFF UIJT TJOHMF 8 'PS UIBUT
    'PS UIBUT 'PS UIBUT 6QEBUJOH PVS UBCMF
    1PTTJCJMJUZ 8BZT UP QSPEVDF 8BZT UP QSPEVDF 8BZT UP QSPEVDF
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    Updating
    Another draw from the bag:
    4

    View Slide

  35. TFF XIBU IBQQFOT )FSFT UIF TBNQMF BHBJO BT B SFNJOEFS
    8 - 8 8 8 - 8 - 8
    ćF GPVSUI PCTFSWBUJPO JT 8 5P VQEBUF PVS QSFWJPVT DPVOUT GPS FBDI QPTTJCJMJUZ XF KVTU
    OFFE UP NVMUJQMZ CZ UIF BQQSPQSJBUF OVNCFS PG XBZT UP TFF UIJT TJOHMF 8 'PS UIBUT
    'PS UIBUT 'PS UIBUT 6QEBUJOH PVS UBCMF
    1PTTJCJMJUZ 8BZT UP QSPEVDF 8BZT UP QSPEVDF 8BZT UP QSPEVDF
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    Updating
    Another draw from the bag:
    4

    View Slide

  36. TFF XIBU IBQQFOT )FSFT UIF TBNQMF BHBJO BT B SFNJOEFS
    8 - 8 8 8 - 8 - 8
    ćF GPVSUI PCTFSWBUJPO JT 8 5P VQEBUF PVS QSFWJPVT DPVOUT GPS FBDI QPTTJCJMJUZ XF KVTU
    OFFE UP NVMUJQMZ CZ UIF BQQSPQSJBUF OVNCFS PG XBZT UP TFF UIJT TJOHMF 8 'PS UIBUT
    'PS UIBUT 'PS UIBUT 6QEBUJOH PVS UBCMF
    1PTTJCJMJUZ 8BZT UP QSPEVDF 8BZT UP QSPEVDF 8BZT UP QSPEVDF
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    Updating
    Another draw from the bag:
    4

    View Slide

  37. TFF XIBU IBQQFOT )FSFT UIF TBNQMF BHBJO BT B SFNJOEFS
    8 - 8 8 8 - 8 - 8
    ćF GPVSUI PCTFSWBUJPO JT 8 5P VQEBUF PVS QSFWJPVT DPVOUT GPS FBDI QPTTJCJMJUZ XF KVTU
    OFFE UP NVMUJQMZ CZ UIF BQQSPQSJBUF OVNCFS PG XBZT UP TFF UIJT TJOHMF 8 'PS UIBUT
    'PS UIBUT 'PS UIBUT 6QEBUJOH PVS UBCMF
    1PTTJCJMJUZ 8BZT UP QSPEVDF 8BZT UP QSPEVDF 8BZT UP QSPEVDF
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    Updating
    Another draw from the bag:
    4

    View Slide

  38. TFF XIBU IBQQFOT )FSFT UIF TBNQMF BHBJO BT B SFNJOEFS
    8 - 8 8 8 - 8 - 8
    ćF GPVSUI PCTFSWBUJPO JT 8 5P VQEBUF PVS QSFWJPVT DPVOUT GPS FBDI QPTTJCJMJUZ XF KVTU
    OFFE UP NVMUJQMZ CZ UIF BQQSPQSJBUF OVNCFS PG XBZT UP TFF UIJT TJOHMF 8 'PS UIBUT
    'PS UIBUT 'PS UIBUT 6QEBUJOH PVS UBCMF
    1PTTJCJMJUZ 8BZT UP QSPEVDF 8BZT UP QSPEVDF 8BZT UP QSPEVDF
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    Updating
    Another draw from the bag:
    4

    View Slide

  39. TFF XIBU IBQQFOT )FSFT UIF TBNQMF BHBJO BT B SFNJOEFS
    8 - 8 8 8 - 8 - 8
    ćF GPVSUI PCTFSWBUJPO JT 8 5P VQEBUF PVS QSFWJPVT DPVOUT GPS FBDI QPTTJCJMJUZ XF KVTU
    OFFE UP NVMUJQMZ CZ UIF BQQSPQSJBUF OVNCFS PG XBZT UP TFF UIJT TJOHMF 8 'PS UIBUT
    'PS UIBUT 'PS UIBUT 6QEBUJOH PVS UBCMF
    1PTTJCJMJUZ 8BZT UP QSPEVDF 8BZT UP QSPEVDF 8BZT UP QSPEVDF
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    Updating
    Another draw from the bag:
    4

    View Slide

  40. TFF XIBU IBQQFOT )FSFT UIF TBNQMF BHBJO BT B SFNJOEFS
    8 - 8 8 8 - 8 - 8
    ćF GPVSUI PCTFSWBUJPO JT 8 5P VQEBUF PVS QSFWJPVT DPVOUT GPS FBDI QPTTJCJMJUZ XF KVTU
    OFFE UP NVMUJQMZ CZ UIF BQQSPQSJBUF OVNCFS PG XBZT UP TFF UIJT TJOHMF 8 'PS UIBUT
    'PS UIBUT 'PS UIBUT 6QEBUJOH PVS UBCMF
    1PTTJCJMJUZ 8BZT UP QSPEVDF 8BZT UP QSPEVDF 8BZT UP QSPEVDF
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    < > × × = × =
    Updating
    Another draw from the bag:

    View Slide

  41. e whole sample
    8F DBO LFFQ BQQMZJOH UIJT SVMF BHBJO BOE BHBJO UP VQEBUF GPS FBDI OFX PCTFSWBUJPO 'PS BMM
    OJOF PCTFSWBUJPOT UIF DPNQMFUF UBCMF JT CFMPX 'PS FBDI QPTTJCJMJUZ UIF UPUBM DPVOU JT KVTU UIF
    QSPEVDU PG UIF OVNCFS PG XBZT UP TFF 8 UP UIF QPXFS PG UIF OVNCFS PG UJNFT 8 XBT TBNQMFE
    BOE UIF OVNCFS PG XBZT UP TFF - UP UIF QPXFS PG UIF OVNCFS PG UJNFT - XBT TBNQMFE ćBU JT
    BO BXGVM UIJOH UP XSJUF EPXO CVU DPODFQUVBMMZ XF KVTU NVMUJQMZ UIF DPVOU FBDI UJNF CZ UIF
    OVNCFS PG XBZT UP TFF UIF NPTU SFDFOU PCTFSWBUJPO
    1PTTJCJMJUZ 0CTFSWBUJPOT
    < > = ×
    < > = ×
    < > = ×
    < > = ×
    < > = ×

    View Slide

  42. e whole sample
    8F DBO LFFQ BQQMZJOH UIJT SVMF BHBJO BOE BHBJO UP VQEBUF GPS FBDI OFX PCTFSWBUJPO 'PS BMM
    OJOF PCTFSWBUJPOT UIF DPNQMFUF UBCMF JT CFMPX 'PS FBDI QPTTJCJMJUZ UIF UPUBM DPVOU JT KVTU UIF
    QSPEVDU PG UIF OVNCFS PG XBZT UP TFF 8 UP UIF QPXFS PG UIF OVNCFS PG UJNFT 8 XBT TBNQMFE
    BOE UIF OVNCFS PG XBZT UP TFF - UP UIF QPXFS PG UIF OVNCFS PG UJNFT - XBT TBNQMFE ćBU JT
    BO BXGVM UIJOH UP XSJUF EPXO CVU DPODFQUVBMMZ XF KVTU NVMUJQMZ UIF DPVOU FBDI UJNF CZ UIF
    OVNCFS PG XBZT UP TFF UIF NPTU SFDFOU PCTFSWBUJPO
    1PTTJCJMJUZ 0CTFSWBUJPOT
    < > = ×
    < > = ×
    < > = ×
    < > = ×
    < > = ×

    View Slide

  43. e whole sample
    8F DBO LFFQ BQQMZJOH UIJT SVMF BHBJO BOE BHBJO UP VQEBUF GPS FBDI OFX PCTFSWBUJPO 'PS BMM
    OJOF PCTFSWBUJPOT UIF DPNQMFUF UBCMF JT CFMPX 'PS FBDI QPTTJCJMJUZ UIF UPUBM DPVOU JT KVTU UIF
    QSPEVDU PG UIF OVNCFS PG XBZT UP TFF 8 UP UIF QPXFS PG UIF OVNCFS PG UJNFT 8 XBT TBNQMFE
    BOE UIF OVNCFS PG XBZT UP TFF - UP UIF QPXFS PG UIF OVNCFS PG UJNFT - XBT TBNQMFE ćBU JT
    BO BXGVM UIJOH UP XSJUF EPXO CVU DPODFQUVBMMZ XF KVTU NVMUJQMZ UIF DPVOU FBDI UJNF CZ UIF
    OVNCFS PG XBZT UP TFF UIF NPTU SFDFOU PCTFSWBUJPO
    1PTTJCJMJUZ 0CTFSWBUJPOT
    < > = ×
    < > = ×
    < > = ×
    < > = ×
    < > = ×

    View Slide

  44. e whole sample
    8F DBO LFFQ BQQMZJOH UIJT SVMF BHBJO BOE BHBJO UP VQEBUF GPS FBDI OFX PCTFSWBUJPO 'PS BMM
    OJOF PCTFSWBUJPOT UIF DPNQMFUF UBCMF JT CFMPX 'PS FBDI QPTTJCJMJUZ UIF UPUBM DPVOU JT KVTU UIF
    QSPEVDU PG UIF OVNCFS PG XBZT UP TFF 8 UP UIF QPXFS PG UIF OVNCFS PG UJNFT 8 XBT TBNQMFE
    BOE UIF OVNCFS PG XBZT UP TFF - UP UIF QPXFS PG UIF OVNCFS PG UJNFT - XBT TBNQMFE ćBU JT
    BO BXGVM UIJOH UP XSJUF EPXO CVU DPODFQUVBMMZ XF KVTU NVMUJQMZ UIF DPVOU FBDI UJNF CZ UIF
    OVNCFS PG XBZT UP TFF UIF NPTU SFDFOU PCTFSWBUJPO
    1PTTJCJMJUZ 0CTFSWBUJPOT
    < > = ×
    < > = ×
    < > = ×
    < > = ×
    < > = ×

    View Slide

  45. e whole sample
    8F DBO LFFQ BQQMZJOH UIJT SVMF BHBJO BOE BHBJO UP VQEBUF GPS FBDI OFX PCTFSWBUJPO 'PS BMM
    OJOF PCTFSWBUJPOT UIF DPNQMFUF UBCMF JT CFMPX 'PS FBDI QPTTJCJMJUZ UIF UPUBM DPVOU JT KVTU UIF
    QSPEVDU PG UIF OVNCFS PG XBZT UP TFF 8 UP UIF QPXFS PG UIF OVNCFS PG UJNFT 8 XBT TBNQMFE
    BOE UIF OVNCFS PG XBZT UP TFF - UP UIF QPXFS PG UIF OVNCFS PG UJNFT - XBT TBNQMFE ćBU JT
    BO BXGVM UIJOH UP XSJUF EPXO CVU DPODFQUVBMMZ XF KVTU NVMUJQMZ UIF DPVOU FBDI UJNF CZ UIF
    OVNCFS PG XBZT UP TFF UIF NPTU SFDFOU PCTFSWBUJPO
    1PTTJCJMJUZ 0CTFSWBUJPOT
    < > = ×
    < > = ×
    < > = ×
    < > = ×
    < > = ×

    View Slide

  46. e whole sample
    8F DBO LFFQ BQQMZJOH UIJT SVMF BHBJO BOE BHBJO UP VQEBUF GPS FBDI OFX PCTFSWBUJPO 'PS BMM
    OJOF PCTFSWBUJPOT UIF DPNQMFUF UBCMF JT CFMPX 'PS FBDI QPTTJCJMJUZ UIF UPUBM DPVOU JT KVTU UIF
    QSPEVDU PG UIF OVNCFS PG XBZT UP TFF 8 UP UIF QPXFS PG UIF OVNCFS PG UJNFT 8 XBT TBNQMFE
    BOE UIF OVNCFS PG XBZT UP TFF - UP UIF QPXFS PG UIF OVNCFS PG UJNFT - XBT TBNQMFE ćBU JT
    BO BXGVM UIJOH UP XSJUF EPXO CVU DPODFQUVBMMZ XF KVTU NVMUJQMZ UIF DPVOU FBDI UJNF CZ UIF
    OVNCFS PG XBZT UP TFF UIF NPTU SFDFOU PCTFSWBUJPO
    1PTTJCJMJUZ 0CTFSWBUJPOT
    < > = ×
    < > = ×
    < > = ×
    < > = ×
    < > = ×
    Ways for p to produce W,L = (4p)W × (4–4p)L

    View Slide

  47. Probability
    Probability: Non-negative values that sum to one
    Suppose W=20, L=10. en p=0.5 has
    ways to produce sample. Better to convert to probability.
    2W × 2L = 1,073,741,824

    View Slide

  48. Probability
    ĻıĮĿıĶŇIJ UIF DPVOUT ćJT KVTU NFBOT UP EJWJEF FBDI DPVOU CZ UIFJS UPUBM
    FX WBMVFT TVN UP POF 8F DBO TBGFMZ EP UIJT CFDBVTF EJWJEJOH UIF DPVOUT
    CFS UIFJS UPUBM
    EPFTOU EJTDBSE BOZ JOGPSNBUJPO *U DBO CF SFWFSTFE *U KVTU
    ST FBTJFS UP DPNQBSF
    J[JOH UIF DPVOUT JT XIFSF ĽĿļįĮįĶĹĶŁņ DPNFT JO 'PS UIF PSJHJOBM TBNQMF
    -888-8-8 TUBOEBSEJ[JOH HJWFT VT UIFTF QSPCBCJMJUJFT
    1PTTJCMF 8BZT UP 1SPCBCJMJUZ PG
    QSPQPSUJPO QSPEVDF TBNQMF QSPQPSUJPO

    .
    .
    .

    EF UP DBMDVMBUF UIF XBZT BOE UIF QSPCBCJMJUJFT GSPN UIF TBNQMF
    ę
    4."-- 803-%
    0 0.25 0.5 0.75 1
    proportion water
    probability
    0.0 0.1 0.2 0.3 0.4 0.5

    View Slide

  49. Probability
    ĻıĮĿıĶŇIJ UIF DPVOUT ćJT KVTU NFBOT UP EJWJEF FBDI DPVOU CZ UIFJS UPUBM
    FX WBMVFT TVN UP POF 8F DBO TBGFMZ EP UIJT CFDBVTF EJWJEJOH UIF DPVOUT
    CFS UIFJS UPUBM
    EPFTOU EJTDBSE BOZ JOGPSNBUJPO *U DBO CF SFWFSTFE *U KVTU
    ST FBTJFS UP DPNQBSF
    J[JOH UIF DPVOUT JT XIFSF ĽĿļįĮįĶĹĶŁņ DPNFT JO 'PS UIF PSJHJOBM TBNQMF
    -888-8-8 TUBOEBSEJ[JOH HJWFT VT UIFTF QSPCBCJMJUJFT
    1PTTJCMF 8BZT UP 1SPCBCJMJUZ PG
    QSPQPSUJPO QSPEVDF TBNQMF QSPQPSUJPO

    .
    .
    .

    EF UP DBMDVMBUF UIF XBZT BOE UIF QSPCBCJMJUJFT GSPN UIF TBNQMF
    ę
    4."-- 803-%
    0 0.25 0.5 0.75 1
    proportion water
    probability
    0.0 0.1 0.2 0.3 0.4 0.5
    Posterior distribution

    View Slide

  50. ESB
    0 0.25 0.5 0.75 1
    proportion water
    0.0 0.
    3 DPEF
    sample <- c("W","L","W","W","W","L","W","L","W")
    W <- sum(sample=="W") # number of W observed
    L <- sum(sample=="L") # number of L observed
    p <- c(0,0.25,0.5,0.75,1) # proportions W
    ways <- sapply( p , function(q) (q*4)^W * ((1-q)*4)^L )
    prob <- ways/sum(ways)
    cbind( p , ways , prob )
    p ways prob
    [1,] 0.00 0 0.00000000
    [2,] 0.25 27 0.02129338
    [3,] 0.50 512 0.40378549
    [4,] 0.75 729 0.57492114
    [5,] 1.00 0 0.00000000
    ćFTF QSPCBCJMJUJFT BSF SFMBUJWF QMBVTJCJMJUJFT GPS UIF EJČFSFOU QSPQPSUJPOT PG XBUFS ćFZ BSF
    DPNQVUFE BęFS VQEBUJOH GPS BMM UIF PCTFSWBUJPOT BOE UIF TFU PG UIFTF QSPCBCJMJUJFT JT VTVBMMZ
    Probability
    ę
    4."-- 803-%
    0 0.25 0.5 0.75 1
    proportion water
    probability
    0.0 0.1 0.2 0.3 0.4 0.5

    View Slide

  51. Work ow
    (1) De ne generative model of the sample
    (2) De ne a speci c estimand
    (3) Design a statistical way to produce estimate
    (4) Test (3) using (1)
    (5) Analyze sample, summarize

    View Slide

  52. Test Before You Est(imate)
    (1) Code a generative simulation
    (2) Code an estimator
    (3) Test (2) with (1)
    Extremely powerful, fun

    View Slide

  53. Generative simulation
    F E
    0VS BQQSPBDI JO UIJT CPPL XJMM CF UP XSJUF DPEF UIBU TIBEPXT FBDI TUFQ FTUJNBOE →
    FTUJNBUPS → FTUJNBUF "OE XF DBO UFTU FBDI TUFQ BT XF HP *O GBDU BT UIF TUBUJTUJDBM NPEFMT
    HFU NPSF DPNQMFY XFMM IBWF B MBEEFS PG UFTUT UP NBLF UIF DPOTUSVDUJPO PG UIF NPEFM FBTJFS
    BOE TBGFS
    4ZOUIFUJD TBNQMF ćF ĕSTU UIJOH UP EP JT UP TJNVMBUF B TBNQMF GSPN B HFOFSBUJWF
    NPEFM XIJDI JT VTFE UP EFĕOF UIF FTUJNBOE ćJT QSPEVDFT POF PS NBOZ ŀņĻŁĵIJŁĶİ TBN
    QMFT ćFO XF DBO GFFE UIF TZOUIFUJD TBNQMFT JOUP UIF TUBUJTUJDBM QSPDFEVSF BOE TFF UIBU JU
    CFIBWFT BT XF IPQF
    'PS UIF HMPCF UPTTJOH QSPCMFN XF XBOU UP TJNVMBUF TBNQMJOH GSPN UIF HMPCF * BN HPJOH
    UP XSJUF B GVODUJPO UIBU TJNVMBUFT TBNQMJOH GSPN UIF HMPCF *G ZPV BSF OPU GBNJMJBS XJUI
    GVODUJPOT ZPV DBO UIJOL PG UIFN BT OBNFT GPS QJFDFT PG DPEF UIBU ZPV XBOU UP SFVTF *O
    BEEJUJPO UP NBLJOH JU FBTJFS UP SFQFBU UIF DPEF B GVODUJPO DBO BMTP NBLF UFTUJOH FBTJFS )FSFT
    B WFSZ TJNQMF GVODUJPO UIBU TJNVMBUFT TBNQMJOH OJOF UJNFT GSPN B HMPCF XJUI B XBUFS
    3 DPEF

    # function to toss a globe covered p by water N times
    sim_globe <- function( p=0.7 , N=9 ) {
    sample(c("W","L"),size=N,prob=c(p,1-p),replace=TRUE)
    }
    /PUIJOH IBQQFOT VOUJM XF DBMM UIF GVODUJPO CZ JUT OBNF
    #VU DIBOHJOH Q CZ GPS FYBNQMF FSBTJOH B SBOEPN DPOUJOFOU PO UI
    BOE - BU MFBTU PO BWFSBHF 4P XF OFFE UXP NPSF BSSPXT
    Q
    /
    8
    -
    SBN PG UIF HMPCF UPTTJOH FYQFSJNFOU ćFSF BSF TPNF HFOFSBM BOE
    OH UIFTF EJBHSBNT #VU XF EPOU OFFE UIFN SJHIU OPX 4P JOTUFBE
    NBLFT B MPU PG TUSPOH BTTVNQUJPOT CFDBVTF PG WBSJBCMFT BOE BSSPX
    NQMF UIF TBNQMF TJ[F / JT JOEFQFOEFOU PG Q BOE UIF SFTVMUT 8 BOE
    W,L = f(p, N)

    View Slide


  54. BEEJUJPO UP NBLJOH JU FBTJFS UP SFQFBU UIF DPEF B GVODUJPO DBO BMTP NBLF UFTUJOH FBTJFS )FSFT
    B WFSZ TJNQMF GVODUJPO UIBU TJNVMBUFT TBNQMJOH OJOF UJNFT GSPN B HMPCF XJUI B XBUFS
    3 DPEF

    # function to toss a globe covered p by water N times
    sim_globe <- function( p=0.7 , N=9 ) {
    sample(c("W","L"),size=N,prob=c(p,1-p),replace=TRUE)
    }
    /PUIJOH IBQQFOT VOUJM XF DBMM UIF GVODUJPO CZ JUT OBNF
    3 DPEF

    sim_globe()
    [1] "L" "W" "W" "W" "L" "L" "L" "W" "L"
    3FQFBU DBMMJOH UIF GVODUJPO UP TFF UIBU JU TJNVMBUFT B EJČFSFOU TBNQMF FBDI UJNF "OE CZ
    OBNJOH UIF QSPQPSUJPO PG XBUFS p BOE OVNCFS PG UPTTFT N JO UIF GVODUJPO EFĕOJUJPO XF DBO
    FBTJMZ DIBOHF UIFTF WBMVFT XIFO XF DBMM UIF GVODUJPO
    Possible
    observations
    Number
    of tosses
    Probability of each
    possible observation

    View Slide


  55. BEEJUJPO UP NBLJOH JU FBTJFS UP SFQFBU UIF DPEF B GVODUJPO DBO BMTP NBLF UFTUJOH FBTJFS )FSFT
    B WFSZ TJNQMF GVODUJPO UIBU TJNVMBUFT TBNQMJOH OJOF UJNFT GSPN B HMPCF XJUI B XBUFS
    3 DPEF

    # function to toss a globe covered p by water N times
    sim_globe <- function( p=0.7 , N=9 ) {
    sample(c("W","L"),size=N,prob=c(p,1-p),replace=TRUE)
    }
    /PUIJOH IBQQFOT VOUJM XF DBMM UIF GVODUJPO CZ JUT OBNF
    3 DPEF

    sim_globe()
    [1] "L" "W" "W" "W" "L" "L" "L" "W" "L"
    3FQFBU DBMMJOH UIF GVODUJPO UP TFF UIBU JU TJNVMBUFT B EJČFSFOU TBNQMF FBDI UJNF "OE CZ
    OBNJOH UIF QSPQPSUJPO PG XBUFS p BOE OVNCFS PG UPTTFT N JO UIF GVODUJPO EFĕOJUJPO XF DBO
    FBTJMZ DIBOHF UIFTF WBMVFT XIFO XF DBMM UIF GVODUJPO

    BEEJUJPO UP NBLJOH JU FBTJFS UP SFQFBU UIF DPEF B GVODUJPO DBO BMTP NBLF UFTUJOH FBTJFS )FSFT
    B WFSZ TJNQMF GVODUJPO UIBU TJNVMBUFT TBNQMJOH OJOF UJNFT GSPN B HMPCF XJUI B XBUFS
    3 DPEF

    # function to toss a globe covered p by water N times
    sim_globe <- function( p=0.7 , N=9 ) {
    sample(c("W","L"),size=N,prob=c(p,1-p),replace=TRUE)
    }
    /PUIJOH IBQQFOT VOUJM XF DBMM UIF GVODUJPO CZ JUT OBNF
    3 DPEF

    sim_globe()
    [1] "L" "W" "W" "W" "L" "L" "L" "W" "L"
    3FQFBU DBMMJOH UIF GVODUJPO UP TFF UIBU JU TJNVMBUFT B EJČFSFOU TBNQMF FBDI UJNF "OE CZ
    OBNJOH UIF QSPQPSUJPO PG XBUFS p BOE OVNCFS PG UPTTFT N JO UIF GVODUJPO EFĕOJUJPO XF DBO
    FBTJMZ DIBOHF UIFTF WBMVFT XIFO XF DBMM UIF GVODUJPO

    View Slide

  56. /PUIJOH IBQQFOT VOUJM XF DBMM UIF GVODUJPO CZ JUT OBNF
    3 DPEF

    sim_globe()
    [1] "L" "W" "W" "W" "L" "L" "L" "W" "L"
    3FQFBU DBMMJOH UIF GVODUJPO UP TFF UIBU JU TJNVMBUFT B EJČFSFOU TBNQMF FBDI UJNF "OE CZ
    OBNJOH UIF QSPQPSUJPO PG XBUFS p BOE OVNCFS PG UPTTFT N JO UIF GVODUJPO EFĕOJUJPO XF DBO
    FBTJMZ DIBOHF UIFTF WBMVFT XIFO XF DBMM UIF GVODUJPO
    replicate(sim_globe(p=0.5,N=9),n=10)
    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    [1,] "W" "L" "L" "W" "W" "L" "L" "W" "W" "L"
    [2,] "W" "L" "W" "L" "W" "L" "L" "W" "L" "L"
    [3,] "W" "L" "L" "L" "L" "W" "L" "W" "W" "W"
    [4,] "W" "W" "L" "W" "L" "W" "W" "W" "W" "W"
    [5,] "L" "W" "W" "W" "W" "W" "L" "W" "L" "L"
    [6,] "L" "W" "L" "L" "W" "L" "W" "W" "W" "W"
    [7,] "W" "W" "W" "L" "W" "W" "W" "L" "L" "L"
    [8,] "L" "W" "L" "L" "L" "W" "L" "W" "W" "W"
    [9,] "W" "L" "L" "W" "L" "W" "W" "W" "L" "L"

    View Slide

  57. /PUIJOH IBQQFOT VOUJM XF DBMM UIF GVODUJPO CZ JUT OBNF
    3 DPEF

    sim_globe()
    [1] "L" "W" "W" "W" "L" "L" "L" "W" "L"
    3FQFBU DBMMJOH UIF GVODUJPO UP TFF UIBU JU TJNVMBUFT B EJČFSFOU TBNQMF FBDI UJNF "OE CZ
    OBNJOH UIF QSPQPSUJPO PG XBUFS p BOE OVNCFS PG UPTTFT N JO UIF GVODUJPO EFĕOJUJPO XF DBO
    FBTJMZ DIBOHF UIFTF WBMVFT XIFO XF DBMM UIF GVODUJPO
    4."-- 803-%4 "/% -"3(& 803-%4
    3 DPEF
    sim_globe( p=1 , N=11 )
    [1] "W" "W" "W" "W" "W" "W" "W" "W" "W" "W" "W"
    /PX XF IBWF B TBNQMF PG UPTTFT GSPN B HMPCF DPWFSFE FOUJSFMZ JO XBUFS ćFZ TIPVME BMM CF
    8 ćJT JT B UFTU PG PVS TJNVMBUJPO ćF ĕSTU UIJOH UP EP FBDI UJNF ZPV XSJUF B TZOUIFUJD EBUB
    TJNVMBUJPO JT UP UFTU JU GPS JOQVUT GPS XIJDI ZPV BMSFBEZ LOPX IPX JU TIPVME CFIBWF ćFTF
    JOQVUT XJMM VTVBMMZ CF FYUSFNF WBMVFT 4P HP BIFBE BOE USZ p=0 UPP :PV TIPVME POMZ HFU -
    -FUT USZ B USJDLJFS UFTU "T UIF TBNQMF TJ[F JODSFBTFT UIF QSPQPSUJPO PG 8 JO UIF TBNQMF
    TIPVME HFU DMPTF UP p 4P MFUT USZ
    3 DPEF
    Test the simulation on extreme settings
    ę
    4."-- 803-%4 "/% -"3(& 803-%4
    3 DPEF
    sim_globe( p=1 , N=11 )
    [1] "W" "W" "W" "W" "W" "W" "W" "W" "W" "W" "W"
    /PX XF IBWF B TBNQMF PG UPTTFT GSPN B HMPCF DPWFSFE FOUJSFMZ JO XBUFS ćFZ TIPVME BMM CF
    8 ćJT JT B UFTU PG PVS TJNVMBUJPO ćF ĕSTU UIJOH UP EP FBDI UJNF ZPV XSJUF B TZOUIFUJD EBUB
    TJNVMBUJPO JT UP UFTU JU GPS JOQVUT GPS XIJDI ZPV BMSFBEZ LOPX IPX JU TIPVME CFIBWF ćFTF
    JOQVUT XJMM VTVBMMZ CF FYUSFNF WBMVFT 4P HP BIFBE BOE USZ p=0 UPP :PV TIPVME POMZ HFU -
    -FUT USZ B USJDLJFS UFTU "T UIF TBNQMF TJ[F JODSFBTFT UIF QSPQPSUJPO PG 8 JO UIF TBNQMF
    TIPVME HFU DMPTF UP p 4P MFUT USZ
    3 DPEF
    sum( sim_globe( p=0.5 , N=1e4 ) == "W" ) / 1e4
    [1] 0.505
    5SZ TPNF PUIFS WBMVFT GPS p UP NBLF TVSF UIF TJNVMBUJPO JT GVODUJPOJOH DPSSFDUMZ /PUJDF UIBU
    ZPV BSF BMNPTU OFWFS HPJOH UP HFU FYBDUMZ p CBDL ćJT JT OPSNBM 4BNQMFT BSF ĕOJUF BOE

    View Slide

  58. IF YOU TEST
    NOTHING
    YOU MISS
    EVERYTHING

    View Slide

  59. Code the estimator
    26"-*5: "4463"/$&
    # function to compute posterior distribution
    compute_posterior <- function( the_sample , poss=c(0,0.25,0.5,0.75,1) ) {
    W <- sum(the_sample=="W") # number of W observed
    L <- sum(the_sample=="L") # number of L observed
    ways <- sapply( poss , function(q) (q*4)^W * ((1-q)*4)^L )
    post <- ways/sum(ways)
    bars <- sapply( post, function(q) make_bar(q) )
    data.frame( poss , ways , post=round(post,3) , bars )
    }
    5P VTF UIJT GVODUJPO ZPV OFFE UP HJWF JU B TBNQMF "OE XF DBO KVTU FNCFE UIF QSFWJPVT
    TJNVMBUJPO GVODUJPO JOTJEF JU
    Ways for p to produce W,L = (4p)W × (4–4p)L

    View Slide

  60. Code the estimator
    26"-*5: "4463"/$&
    # function to compute posterior distribution
    compute_posterior <- function( the_sample , poss=c(0,0.25,0.5,0.75,1) ) {
    W <- sum(the_sample=="W") # number of W observed
    L <- sum(the_sample=="L") # number of L observed
    ways <- sapply( poss , function(q) (q*4)^W * ((1-q)*4)^L )
    post <- ways/sum(ways)
    bars <- sapply( post, function(q) make_bar(q) )
    data.frame( poss , ways , post=round(post,3) , bars )
    }
    5P VTF UIJT GVODUJPO ZPV OFFE UP HJWF JU B TBNQMF "OE XF DBO KVTU FNCFE UIF QSFWJPVT
    TJNVMBUJPO GVODUJPO JOTJEF JU
    Ways for p to produce W,L = (4p)W × (4–4p)L

    View Slide

  61. Code the estimator
    26"-*5: "4463"/$&
    # function to compute posterior distribution
    compute_posterior <- function( the_sample , poss=c(0,0.25,0.5,0.75,1) ) {
    W <- sum(the_sample=="W") # number of W observed
    L <- sum(the_sample=="L") # number of L observed
    ways <- sapply( poss , function(q) (q*4)^W * ((1-q)*4)^L )
    post <- ways/sum(ways)
    bars <- sapply( post, function(q) make_bar(q) )
    data.frame( poss , ways , post=round(post,3) , bars )
    }
    5P VTF UIJT GVODUJPO ZPV OFFE UP HJWF JU B TBNQMF "OE XF DBO KVTU FNCFE UIF QSFWJPVT
    TJNVMBUJPO GVODUJPO JOTJEF JU
    Ways for p to produce W,L = (4p)W × (4–4p)L

    View Slide

  62. Code the estimator
    26"-*5: "4463"/$&
    # function to compute posterior distribution
    compute_posterior <- function( the_sample , poss=c(0,0.25,0.5,0.75,1) ) {
    W <- sum(the_sample=="W") # number of W observed
    L <- sum(the_sample=="L") # number of L observed
    ways <- sapply( poss , function(q) (q*4)^W * ((1-q)*4)^L )
    post <- ways/sum(ways)
    bars <- sapply( post, function(q) make_bar(q) )
    data.frame( poss , ways , post=round(post,3) , bars )
    }
    5P VTF UIJT GVODUJPO ZPV OFFE UP HJWF JU B TBNQMF "OE XF DBO KVTU FNCFE UIF QSFWJPVT
    TJNVMBUJPO GVODUJPO JOTJEF JU
    Ways for p to produce W,L = (4p)W × (4–4p)L

    View Slide

  63. Code the estimator
    26"-*5: "4463"/$&
    # function to compute posterior distribution
    compute_posterior <- function( the_sample , poss=c(0,0.25,0.5,0.75,1) ) {
    W <- sum(the_sample=="W") # number of W observed
    L <- sum(the_sample=="L") # number of L observed
    ways <- sapply( poss , function(q) (q*4)^W * ((1-q)*4)^L )
    post <- ways/sum(ways)
    bars <- sapply( post, function(q) make_bar(q) )
    data.frame( poss , ways , post=round(post,3) , bars )
    }
    5P VTF UIJT GVODUJPO ZPV OFFE UP HJWF JU B TBNQMF "OE XF DBO KVTU FNCFE UIF QSFWJPVT
    TJNVMBUJPO GVODUJPO JOTJEF JU
    Ways for p to produce W,L = (4p)W × (4–4p)L

    View Slide

  64. ESB
    TJNVMBUJPO GVODUJPO JOTJEF JU
    3 DPEF

    compute_posterior( sim_globe() )
    poss ways post bars
    1 0.00 0 0.000
    2 0.25 243 0.291 ######
    3 0.50 512 0.612 ############
    4 0.75 81 0.097 ##
    5 1.00 0 0.000
    3FQFBU UIJT GVODUJPO DBMM B GFX UJNFT UP TIPX UIBU BT UIF TBNQMF WBSJFT TP UPP EPFT UIF QPTUF
    SJPS EJTUSJCVUJPO
    )PX EP XF UFTU PVS FTUJNBUPS "HBJO UIF ĕSTU UIJOH UP USZ BSF TPNF FYUSFNF TBNQMFT
    XJUI LOPXO QSPQFSUJFT ćFO XF DBO USZ JODSFBTJOH UIF TBNQMF TJ[F BOE FOTVSJOH UIBU UIF
    QPTUFSJPS EJTUSJCVUJPO CFIBWFT DPSSFDUMZ ćF ĕSTU FYUSFNF UFTU JT B TBNQMF XJUI POMZ 8
    3 DPEF

    compute_posterior( rep("W",times=9) )
    poss ways post bars
    1 0.00 0 0.000
    2 0.25 1 0.000
    (1) Test the estimator where the answer is known
    (2) Explore di erent sampling designs
    (3) Develop intuition for sampling and estimation

    View Slide

  65. PAUSE

    View Slide

  66. More possibilities
    4-sided globe
    [0 0.25 0.5 0.75 1]

    View Slide

  67. More possibilities
    4-sided globe 10-sided globe
    [0 0.25 0.5 0.75 1] [0 0.1 0.2 0.3 0.4 0.5
    0.6 0.7 0.8 0.9 1]

    View Slide

  68. More possibilities
    4-sided globe 10-sided globe
    [0 0.25 0.5 0.75 1] [0 0.1 0.2 0.3 0.4 0.5
    0.6 0.7 0.8 0.9 1]
    20-sided globe
    [0 0.05 0.10 0.15 0.20 0.25 0.30
    0.35 0.40 0.45 0.50 0.55 0.60 0.65
    0.70 0.75 0.80 0.85 0.90 0.95 1]

    View Slide

  69. More possibilities

    4."-- 803-%4 "/% -"3(& 803-%4
    0 0.25 0.5 0.75 1
    proportion water
    probability
    0.0 0.1 0.2 0.3 0.4 0.5
    'ĶĴłĿIJ Ɗƍ ćF QPTUFSJPS QSPCBCJMJUZ EJTUSJ
    CVUJPO GPS UIF TBNQMF 8-888-8-8 GPS
    UIF QSPQPSUJPOT BOE
    EF
    sample <- c("W","L","W","W","W","L","W","L","W")
    W <- sum(sample=="W") # number of W observed
    5 possibilities

    View Slide

  70. More possibilities

    4."-- 803-%4 "/% -"3(& 803-%4
    0 0.2 0.4 0.6 0.8 1
    proportion water
    posterior probability
    0.00 0.05 0.10 0.15 0.20 0.25 0.30
    11 possibilities
    0 0.1 0.25 0.4 0.55 0.7 0.85 1
    proportion water
    posterior probability
    0.00 0.05 0.10 0.15 0.20 0.25 0.30
    21 possibilities
    'ĶĴłĿIJ ƊƎ ćF QPTUFSJPS EJTUSJCVUJPO GPS UIF HMPCF TBNQMF DPNQVUFE XJUI
    JODSFBTJOH OVNCFST PG QPTTJCMF QSPQPSUJPOT PG XBUFS -Fę QPTTJCJMJUJFT
    3JHIU QPTTJCJMJUJFT

    4."-- 803-%4 "/% -"3(& 803-%4
    0 0.25 0.5 0.75 1
    proportion water
    probability
    0.0 0.1 0.2 0.3 0.4 0.5
    'ĶĴłĿIJ Ɗƍ ćF QPTUFSJPS QSPCBCJMJUZ EJTUSJ
    CVUJPO GPS UIF TBNQMF 8-888-8-8 GPS
    UIF QSPQPSUJPOT BOE
    EF
    sample <- c("W","L","W","W","W","L","W","L","W")
    W <- sum(sample=="W") # number of W observed
    5 possibilities

    View Slide

  71. More possibilities

    4."-- 803-%4 "/% -"3(& 803-%4
    0 0.2 0.4 0.6 0.8 1
    proportion water
    posterior probability
    0.00 0.05 0.10 0.15 0.20 0.25 0.30
    11 possibilities
    0 0.1 0.25 0.4 0.55 0.7 0.85 1
    proportion water
    posterior probability
    0.00 0.05 0.10 0.15 0.20 0.25 0.30
    21 possibilities
    'ĶĴłĿIJ ƊƎ ćF QPTUFSJPS EJTUSJCVUJPO GPS UIF HMPCF TBNQMF DPNQVUFE XJUI
    JODSFBTJOH OVNCFST PG QPTTJCMF QSPQPSUJPOT PG XBUFS -Fę QPTTJCJMJUJFT
    3JHIU QPTTJCJMJUJFT

    4."-- 803-%4 "/% -"3(& 803-%4
    0 0.25 0.5 0.75 1
    proportion water
    probability
    0.0 0.1 0.2 0.3 0.4 0.5
    'ĶĴłĿIJ Ɗƍ ćF QPTUFSJPS QSPCBCJMJUZ EJTUSJ
    CVUJPO GPS UIF TBNQMF 8-888-8-8 GPS
    UIF QSPQPSUJPOT BOE
    EF
    sample <- c("W","L","W","W","W","L","W","L","W")
    W <- sum(sample=="W") # number of W observed
    5 possibilities

    View Slide

  72. In nite possibilities
    e globe is a polyhedron with an in nite number of sides
    e posterior probability of any “side” p is proportional to:
    Only trick is normalizing to probability. A er a little calculus:
    #":&4*"/ 61%"5*/(
    " DPOUJOVPVT TPMVUJPO *O UIJT FYBNQMF JUT OPU IBSE UP EFSJWF UIF QPTUFSJPS EJTUSJ
    BO FYBDU DPOUJOVPVT EJTUSJCVUJPO GVODUJPO *U UVSOT PVU UIBU UIF QPTUFSJPS QSPCBCJ
    PTTJCMF QSPQPSUJPO PG XBUFS Q CFUXFFO [FSP BOE POF JT QSPQPSUJPOBM UP
    Q8
    ( − Q)-
    8 JT UIF OVNCFS PG XBUFS PCTFSWFE BOE - JT UIF OVNCFS PG MBOE PCTFSWFE 8I
    UIJT JT UIF FYBDU FYQSFTTJPO UIBU XF BMSFBEZ VTFE UP DBMDVMBUF UIF SFMBUJWF
    OVNCFS
    Z WBMVF Q DPVME QSPEVDF B TBNQMF XJUI 8 XBUFS BOE - MBOE *U JT B MPHJDBM JNQMJDBUJ
    BSEFO PG GPSLJOH QBUIT 8IFO XF VTFE UIJT FYQSFTTJPO PSJHJOBMMZ XF NVMUJQMJFE Q
    IF TVN PG BMM OVNFSBUPST GPS FWFSZ QPTTJCMF Q 'PS B ĕOJUF OVNCFS PG
    E TVN ; = Q
    Q8(−Q)- XIFSF UIF
    Q
    OPUBUJPO NFBOT UP FWBMVBUF
    UIFO BEE UIFN 'PS BO JOĕOJUF OVNCFS PG QPTTJCMF Q WBMVFT GSPN [FSP
    PVT QBSUOFS PG BO JOUFHSBM
    ; = Q8
    ( − Q)-EQ
    iGBNPVTw ZPV NFBO iLOPXOw :PV DBO MPPL JU VQ POMJOF *O UIF DBTF PG

    ; =
    8! -!
    (8 + - + )!
    CBCJMJUZ PG BOZ TQFDJĕD Q JT FYBDUMZ
    = #FUB(8 + , - + ) =
    8!-!
    (8 + - + )!
    Q8
    ( − Q)-
    F TIBQF DPNFT FOUJSFMZ GSPN UIF Q8( − Q)- UFSN BOE UIF SFTU JT KVTU

    View Slide

  73. In nite possibilities
    e globe is a polyhedron with an in nite number of sides
    e posterior probability of any “side” p is proportional to:
    Only trick is normalizing to probability. A er a little calculus:
    #":&4*"/ 61%"5*/(
    " DPOUJOVPVT TPMVUJPO *O UIJT FYBNQMF JUT OPU IBSE UP EFSJWF UIF QPTUFSJPS EJTUSJ
    BO FYBDU DPOUJOVPVT EJTUSJCVUJPO GVODUJPO *U UVSOT PVU UIBU UIF QPTUFSJPS QSPCBCJ
    PTTJCMF QSPQPSUJPO PG XBUFS Q CFUXFFO [FSP BOE POF JT QSPQPSUJPOBM UP
    Q8
    ( − Q)-
    8 JT UIF OVNCFS PG XBUFS PCTFSWFE BOE - JT UIF OVNCFS PG MBOE PCTFSWFE 8I
    UIJT JT UIF FYBDU FYQSFTTJPO UIBU XF BMSFBEZ VTFE UP DBMDVMBUF UIF SFMBUJWF
    OVNCFS
    Z WBMVF Q DPVME QSPEVDF B TBNQMF XJUI 8 XBUFS BOE - MBOE *U JT B MPHJDBM JNQMJDBUJ
    BSEFO PG GPSLJOH QBUIT 8IFO XF VTFE UIJT FYQSFTTJPO PSJHJOBMMZ XF NVMUJQMJFE Q
    Posterior probability of p =
    OPNJOBUPS ; JT UIF TVN PG BMM OVNFSBUPST GPS FWFSZ QPTTJCMF Q 'PS B ĕOJUF OVNCF
    ; JT KVTU B TUBOEBSE TVN ; = Q
    Q8(−Q)- XIFSF UIF
    Q
    OPUBUJPO NFBOT UP FWBMV
    O GPS FWFSZ Q BOE UIFO BEE UIFN 'PS BO JOĕOJUF OVNCFS PG QPTTJCMF Q WBMVFT GSPN [
    TU VTF UIF DPOUJOVPVT QBSUOFS PG BO JOUFHSBM
    ; = Q8
    ( − Q)-EQ
    PVT JOUFHSBM JG CZ iGBNPVTw ZPV NFBO iLOPXOw :PV DBO MPPL JU VQ POMJOF *O UIF DBT
    OE - JU JT HJWFO CZ
    ; =
    8! -!
    (8 + - + )!
    UIF QPTUFSJPS QSPCBCJMJUZ PG BOZ TQFDJĕD Q JT FYBDUMZ
    1S(Q|8, -) = #FUB(8 + , - + ) =
    (8 + - + )!
    8!-!
    Q8
    ( − Q)-
    NQMJDBUFE CVU UIF TIBQF DPNFT FOUJSFMZ GSPN UIF Q8( − Q)- UFSN BOE UIF SFTU JT

    View Slide

  74. In nite possibilities
    Posterior probability of p =
    Normalizing
    constant
    relative number
    of ways to
    observe sample
    e “Beta” distribution
    ; = Q ( − Q) EQ
    PVT JOUFHSBM JG CZ iGBNPVTw ZPV NFBO iLOPXOw :PV DBO MPPL JU VQ POMJOF *O UIF DBT
    OE - JU JT HJWFO CZ
    ; =
    8! -!
    (8 + - + )!
    UIF QPTUFSJPS QSPCBCJMJUZ PG BOZ TQFDJĕD Q JT FYBDUMZ
    1S(Q|8, -) = #FUB(8 + , - + ) =
    (8 + - + )!
    8!-!
    Q8
    ( − Q)-
    NQMJDBUFE CVU UIF TIBQF DPNFT FOUJSFMZ GSPN UIF Q8( − Q)- UFSN BOE UIF SFTU JT
    UIF BSFB VOEFS UIF DVSWF TVNT UP TP UIBU JU JT B QSPQFS QSPCBCJMJUZ EJTUSJCVUJPO "
    )- UFSN JT KVTU BO JNQMJDBUJPO PG UIF HBSEFO PG GPSLJOH QBUIT ćFSFT OPUIJOH FMTF HP
    #FUB EJTUSJCVUJPO JT HJWFO CZ UIF GVODUJPO dbeta() JO 3

    View Slide

  75. Ten tosses of the globe

    View Slide


  76. posterior probability
    0 0.5 1
    0
    W
    0 0.5 1
    0
    W L
    0 0.5 1
    0
    W L W
    posterior probability
    0 0.5 1
    0
    W L W W
    0 0.5 1
    0
    W L W W W
    0 0.5 1
    0
    W L W W W L

    View Slide

  77. ESB
    posterior probability
    0 0.5 1
    0
    W L W W
    0 0.5 1
    0
    W L W W W
    0 0.5 1
    0
    W L W W W L
    proportion water (p)
    posterior probability
    0 0.5 1
    0
    W L W W W L W
    proportion water (p)
    0 0.5 1
    0
    W L W W W L W L
    proportion water (p)
    0 0.5 1
    0
    W L W W W L W L W

    View Slide

  78. (1) No minimum sample size

    View Slide

  79. (2) Shape embodies sample size

    View Slide

  80. (3) No point estimate
    mean
    mode e distribution
    is the estimate
    Always use the
    entire distribution

    View Slide

  81. (4) No one true interval
    Intervals
    communicate shape
    of posterior
    0.0 1.0 2.0
    proportion water
    density
    0 0.5 1

    View Slide

  82. 0.0 1.0 2.0
    proportion water
    density
    0 0.5 1
    (4) No one true interval
    Intervals
    communicate shape
    of posterior
    50%

    View Slide

  83. 0.0 1.0 2.0
    proportion water
    density
    0 0.5 1
    (4) No one true interval
    Intervals
    communicate shape
    of posterior
    89%

    View Slide

  84. 0.0 1.0 2.0
    proportion water
    density
    0 0.5 1
    (4) No one true interval
    Intervals
    communicate shape
    of posterior
    95% is obvious
    superstition. Nothing
    magical happens at
    the boundary.
    99%

    View Slide

  85. Letters From My Reviewers
    “ e author uses these cute
    89% intervals, but we need
    to see the 95% intervals so
    we can tell whether any of
    the e ects are robust.”
    at an arbitrary interval contains an arbitrary
    value is not meaningful. Use the whole distribution.

    View Slide

  86. Work ow
    (1) De ne generative model of the sample
    (2) De ne a speci c estimand
    (3) Design a statistical way to produce estimate
    (4) Test (3) using (1)
    (5) Analyze sample, summarize

    View Slide

  87. From Posterior to Prediction
    Implications of model depend upon entire posterior
    Must average any inference over entire posterior
    is usually requires integral calculus
    OR we can just take samples from the posterior

    View Slide

  88. Sampling the posterior
    XF XJMM VTF TUBUJTUJDBM QSPDFEVSFT UIBU FTUJNBUF UIF QPTUFSJPS EJTUSJCVUJPO XJUI TBNQMFT ćFSF
    XJMM CF OP PUIFS SFQSFTFOUBUJPO PG JU 4P JG ZPV HFU VTFE UP XPSLJOH XJUI QPTUFSJPS TBNQMFT
    OPX ZPV XPOU IBWF UP SFMFBSO BOZUIJOH MBUFS
    *O UIJT DBTF XF DBO ESBX TBNQMFT GSPN UIF QPTUFSJPS XJUI
    3 DPEF

    post_samples <- rbeta( 1e3 , 6+1 , 3+1 )
    /PX post_samples DPOUBJOT QSPQPSUJPOT PG XBUFS
    +VTU TIPX UIF QPTUFSJPS ćF CFTU TVNNBSZ PG UIF QPTUFSJPS EJTUSJCVUJPO JT UIF QPTUF
    SJPS EJTUSJCVUJPO +VTU ESBX JU *O NPSF DPNQMJDBUFE NPEFMT XIBU XFMM ESBX JT B QPTUFSJPS
    F ES
    proportion water
    ćF SFE DVSWF JT BO FTUJNBUF PG UIF EJTUSJCVUJPO CBTFE PO UIF TBNQMFT GSPN JU ćF EBTIFE
    DVSWF JT UIF BOBMZUJDBM FYBDU QPTUFSJPS EJTUSJCVUJPO ćF TIBQF PG UIF SFE DVSWF EFQFOET VQPO
    IPX ZPV FTUJNBUF JU GSPN UIF TBNQMFT‰JUT B TUBUJTUJDBM FTUJNBUF JUTFMG 4P EPOU TUBSU QFFSJOH
    BU UIF XJHHMFT BOE USZJOH UP NBLF TFOTF PG UIFN ćFZ BSF KVTU TBNQMJOH WBSJBUJPO "OE JG XF
    DIBOHF IPX UP DVSWF JT FTUJNBUFE XFMM HFU NPSF PS GFXFS XJHHMFT 8JUI 3T EFOTJUZ FTUJNBUPS
    NBLJOH adj TNBMMFS QSPEVDFT NPSF MPDBM FTUJNBUJPO
    3 DPEF
    dens( post_samples , lwd=4 , col=2 , xlab="proportion water" , adj=0.1 )
    curve( dbeta(x,6+1,3+1) , add=TRUE , lty=2 , lwd=3 )
    0.2 0.4 0.6 0.8
    0 1 2 3
    proportion water
    Density
    beta distribution
    samples

    View Slide

  89. Uncertainty Causal model Implications
    ꔄ ꔄ

    View Slide

  90. plot( table(pred_64) , xlim=c(0,10) , xlab="number of W" , ylab="count" ,
    lwd=10 , col=1 )
    # now simulate posterior predictive distribution
    post_samples <- rbeta(1e4,6+1,3+1)
    pred_post <- sapply( post_samples , function(p) sum(sim_globe(p,10)=="W") )
    tab_post <- table(pred_post)
    for ( i in 0:10 ) lines(c(i,i),c(0,tab_post[i+1]),lwd=4,col=4)
    46.."3*;*/( 1045&3*03 %*453*#65*0/4
    0 500 1500 2500
    number of W
    count
    0 1 2 3 4 5 6 7 8 9 10
    ćF CMBDL IJTUPHSBN TIPXT UIF QSFEJDUJWF EJTUSJCVUJPO GPS Q = . UIF QPTUFSJPS NFBO ćF
    p = 0.64
    entire posterior

    View Slide

  91. Sampling is Fun & Easy
    Sample from posterior, compute desired
    quantity for each sample, pro t
    Much easier than doing integrals
    Turn a calculus problem into
    a data summary problem
    MCMC produces only samples anyway

    View Slide

  92. Sampling is Handsome & Handy
    ings we’ll compute with sampling:
    Model-based forecasts
    Causal e ects
    Counterfactuals
    Prior predictions

    View Slide

  93. Bayesian data analysis
    For each possible explanation of the data,
    Count all the ways data can happen.
    Explanations with more ways to produce
    the data are more plausible.

    View Slide

  94. Bayesian modesty
    No guarantees except logical
    Probability theory is a method of
    logically deducing implications
    of data under assumptions that
    you must choose
    Any framework selling you more
    is hiding assumptions

    View Slide

  95. Course Schedule
    Week 1 Bayesian inference Chapters 1, 2, 3
    Week 2 Linear models & Causal Inference Chapter 4
    Week 3 Causes, Confounds & Colliders Chapters 5 & 6
    Week 4 Over tting / Interactions Chapters 7 & 8
    Week 5 MCMC & Generalized Linear Models Chapters 9, 10, 11
    Week 6 Integers & Other Monsters Chapters 11 & 12
    Week 7 Multilevel models I Chapter 13
    Week 8 Multilevel models II Chapter 14
    Week 9 Measurement & Missingness Chapter 15
    Week 10 Generalized Linear Madness Chapter 16
    https://github.com/rmcelreath/stat_rethinking_2023

    View Slide

  96. View Slide

  97. BONUS
    ROUND

    View Slide

  98. Misclassi cation
    .FBTVSFNFOU BOE NJTDMBTTJĕDBUJPO
    UFS IBT GPDVTFE PO B TJNQMF EFTDSJQUJWF FTUJNBOE UIF QSPQPSUJPO
    OUFE JU JO UIF DPOUFYU PG B TJNQMF DBVTBM EJBHSBN
    Q
    /
    8
    -

    View Slide

  99. Misclassi cation
    JT B CJU SFEVOEBOU CFDBVTF JG XF LOPX / BOE 8 XF DBO KVTU DBMD
    P MFUT SFESBX UIF EJBHSBN XJUI UIBU JO NJOE "OE *MM BEE TPNF PS
    UIJOH FMTF BT XFMM
    Q
    /
    8
    N 8 BOE / BSF ļįŀIJĿŃIJı‰XF LOPX UIFJS WBMVFT ćF WBSJBCM
    FBE JU JT PVS FTUJNBOE 0OF DPOWFOUJPO GPS TIPXJOH XIJDI WBSJBCM
    XIJDI IBWF OPU JT UP ESBX DJSDMFT BSPVOE VOPCTFSWFE WBSJBCMFT
    unobserved

    View Slide

  100. Misclassi cation
    JT B CJU SFEVOEBOU CFDBVTF JG XF LOPX / BOE 8 XF DBO KVTU DBMD
    P MFUT SFESBX UIF EJBHSBN XJUI UIBU JO NJOE "OE *MM BEE TPNF PS
    UIJOH FMTF BT XFMM
    Q
    /
    8
    N 8 BOE / BSF ļįŀIJĿŃIJı‰XF LOPX UIFJS WBMVFT ćF WBSJBCM
    FBE JU JT PVS FTUJNBOE 0OF DPOWFOUJPO GPS TIPXJOH XIJDI WBSJBCM
    XIJDI IBWF OPU JT UP ESBX DJSDMFT BSPVOE VOPCTFSWFE WBSJBCMFT
    population size
    unobserved

    View Slide

  101. Misclassi cation
    ĹĮŀŀĶijĶİĮŁĶļĻ FSSPS "HBJO UIJOL BCPVU HMPCF UPTTJOH #VU
    PVOUJOH 8 BOE - NBLFT NJTUBLFT PG UIF UJNF UIFZ XSJU
    PO TXJUDIJOH 8 GPS - BOE - GPS 8 ćJT JT QBSU PG IPX UIF TBN
    IF TBNQMF "OE XF TIPVME CF BCMF BOE SFBEZ UP JODMVEF JU JO UIF
    Q
    /
    8 8
    PX XF EP OPU PCTFSWF UIF USVF DPVOU 8 *OTUFBE XF PCTFSWF U
    OE 8 JT DBVTFE CZ UXP WBSJBCMFT UIF USVF DPVOU 8 BOE UIF NF
    true samples
    unobserved

    View Slide

  102. Misclassi cation
    ijĶİĮŁĶļĻ FSSPS "HBJO UIJOL BCPVU HMPCF UPTTJOH #VU OPX
    H 8 BOE - NBLFT NJTUBLFT PG UIF UJNF UIFZ XSJUF EPX
    JUDIJOH 8 GPS - BOE - GPS 8 ćJT JT QBSU PG IPX UIF TBNQMF BSJ
    QMF "OE XF TIPVME CF BCMF BOE SFBEZ UP JODMVEF JU JO UIF DBVTB
    Q
    /
    8 8 .
    EP OPU PCTFSWF UIF USVF DPVOU 8 *OTUFBE XF PCTFSWF UIF NJ
    JT DBVTFE CZ UXP WBSJBCMFT UIF USVF DPVOU 8 BOE UIF NFBTVSFN
    misclassi ed
    samples

    View Slide

  103. Misclassi cation
    ĶļĻ FSSPS "HBJO UIJOL BCPVU HMPCF UPTTJOH #VU OPX PVS BT
    OE - NBLFT NJTUBLFT PG UIF UJNF UIFZ XSJUF EPXO UIF
    H 8 GPS - BOE - GPS 8 ćJT JT QBSU PG IPX UIF TBNQMF BSJTFT TP
    OE XF TIPVME CF BCMF BOE SFBEZ UP JODMVEF JU JO UIF DBVTBM EJBH
    Q
    /
    8 8 .
    PU PCTFSWF UIF USVF DPVOU 8 *OTUFBE XF PCTFSWF UIF NJTDMBTT
    VTFE CZ UXP WBSJBCMFT UIF USVF DPVOU 8 BOE UIF NFBTVSFNFOU Q
    measurement
    process

    View Slide

  104. Misclassi cation simulation
    Obey the work ow! Code a generative model:
    .&"463&.&/5 "/% .*4$-"44*'*$"5*0/
    3 DPEF

    sim_globe2 <- function( p=0.7 , N=9 , x=0.1 ) {
    true_sample <- sample(c("W","L"),size=N,prob=c(p,1-p),replace=TRUE)
    obs_sample <- ifelse( runif(N) < x ,
    ifelse( true_sample=="W" , "L" , "W" ) , # error
    true_sample ) # no error
    return(obs_sample)
    }
    5P VOEFSTUBOE UIF QSPCMFN NJTDMBTTJĕDBUJPO DBVTFT GPS PVS QSFWJPVT FTUJNBUPS DPOTJEFS BO
    FYUSFNF DBTF MJLF Q = /PX XJUIPVU FSSPS XFE OFWFS PCTFSWF XBUFS #VU XJUI FSSPS XFMM
    PCTFSWF XBUFS PG UIF UJNF 4JNJMBSMZ PO UIF PUIFS FYUSFNF FOE Q = #VU OPX XF
    TIPVME OFWFS PCTFSWF MBOE CVU XF PCTFSWF JU JOTUFBE PG UIF UJNF (P BIFBE BOE UFTU UIF
    TJNVMBUJPO DPEF BCPWF UP NBLF TVSF JU XPSLT BT FYQFDUFE

    View Slide

  105. Misclassi cation simulation
    Obey the work ow! Code a generative model:
    .&"463&.&/5 "/% .*4$-"44*'*$"5*0/
    3 DPEF

    sim_globe2 <- function( p=0.7 , N=9 , x=0.1 ) {
    true_sample <- sample(c("W","L"),size=N,prob=c(p,1-p),replace=TRUE)
    obs_sample <- ifelse( runif(N) < x ,
    ifelse( true_sample=="W" , "L" , "W" ) , # error
    true_sample ) # no error
    return(obs_sample)
    }
    5P VOEFSTUBOE UIF QSPCMFN NJTDMBTTJĕDBUJPO DBVTFT GPS PVS QSFWJPVT FTUJNBUPS DPOTJEFS BO
    FYUSFNF DBTF MJLF Q = /PX XJUIPVU FSSPS XFE OFWFS PCTFSWF XBUFS #VU XJUI FSSPS XFMM
    PCTFSWF XBUFS PG UIF UJNF 4JNJMBSMZ PO UIF PUIFS FYUSFNF FOE Q = #VU OPX XF
    TIPVME OFWFS PCTFSWF MBOE CVU XF PCTFSWF JU JOTUFBE PG UIF UJNF (P BIFBE BOE UFTU UIF
    TJNVMBUJPO DPEF BCPWF UP NBLF TVSF JU XPSLT BT FYQFDUFE

    View Slide

  106. Misclassi cation simulation
    Obey the work ow! Code a generative model:
    .&"463&.&/5 "/% .*4$-"44*'*$"5*0/
    3 DPEF

    sim_globe2 <- function( p=0.7 , N=9 , x=0.1 ) {
    true_sample <- sample(c("W","L"),size=N,prob=c(p,1-p),replace=TRUE)
    obs_sample <- ifelse( runif(N) < x ,
    ifelse( true_sample=="W" , "L" , "W" ) , # error
    true_sample ) # no error
    return(obs_sample)
    }
    5P VOEFSTUBOE UIF QSPCMFN NJTDMBTTJĕDBUJPO DBVTFT GPS PVS QSFWJPVT FTUJNBUPS DPOTJEFS BO
    FYUSFNF DBTF MJLF Q = /PX XJUIPVU FSSPS XFE OFWFS PCTFSWF XBUFS #VU XJUI FSSPS XFMM
    PCTFSWF XBUFS PG UIF UJNF 4JNJMBSMZ PO UIF PUIFS FYUSFNF FOE Q = #VU OPX XF
    TIPVME OFWFS PCTFSWF MBOE CVU XF PCTFSWF JU JOTUFBE PG UIF UJNF (P BIFBE BOE UFTU UIF
    TJNVMBUJPO DPEF BCPWF UP NBLF TVSF JU XPSLT BT FYQFDUFE

    View Slide

  107. Misclassi cation simulation
    Obey the work ow! Code a generative model:
    .&"463&.&/5 "/% .*4$-"44*'*$"5*0/
    3 DPEF

    sim_globe2 <- function( p=0.7 , N=9 , x=0.1 ) {
    true_sample <- sample(c("W","L"),size=N,prob=c(p,1-p),replace=TRUE)
    obs_sample <- ifelse( runif(N) < x ,
    ifelse( true_sample=="W" , "L" , "W" ) , # error
    true_sample ) # no error
    return(obs_sample)
    }
    5P VOEFSTUBOE UIF QSPCMFN NJTDMBTTJĕDBUJPO DBVTFT GPS PVS QSFWJPVT FTUJNBUPS DPOTJEFS BO
    FYUSFNF DBTF MJLF Q = /PX XJUIPVU FSSPS XFE OFWFS PCTFSWF XBUFS #VU XJUI FSSPS XFMM
    PCTFSWF XBUFS PG UIF UJNF 4JNJMBSMZ PO UIF PUIFS FYUSFNF FOE Q = #VU OPX XF
    TIPVME OFWFS PCTFSWF MBOE CVU XF PCTFSWF JU JOTUFBE PG UIF UJNF (P BIFBE BOE UFTU UIF
    TJNVMBUJPO DPEF BCPWF UP NBLF TVSF JU XPSLT BT FYQFDUFE

    View Slide

  108. Misclassi cation estimator
    Use the intuition from the generative model to draw out the
    Garden of Forking Data, build a Bayesian estimator.
    Two stages: (1) true samples, (2) misclassi cation

    View Slide

  109. true samples

    View Slide

  110. true samples
    observed samples
    1-in-3 misclassi ed

    View Slide

  111. Observe — How many ways can this happen?

    View Slide

  112. 6 ways to observe water, when true sample is water




    ✓ ✓

    View Slide

  113. 1 way to observe water, when true sample is land

    View Slide

  114. 3×2 + 1×1 = 7 ways to observe water




    ✓ ✓

    View Slide

  115. Misclassi cation estimator
    Posterior distribution for p given W,L,x:
    UPUBM 4P XF FOE VQ XJUI QBUIT PVU PG ćJT JT UIF QSPCBCJMJUZ UIBU XF
    EJOH NJTDMBTTJĕDBUJPO JO UIF QSPDFTT
    PWF DPVOUJOH UP QSJNF PVU JOUVJUJPO XF DBO XSJUF B QSPCBCJMJUZ FYQSFTTJPO
    Z PG PCTFSWJOH 8 PO BOZ HJWFO UPTT PG UIF HMPCF *U JT
    1S(XBUFS|Q, Y) = Q( − Y) + ( − Q)Y
    PQPSUJPO PG XBUFS PO UIF HMPCF BOE Y UIF DIBODF PG NJTDMBTTJĕDBUJPO ćJT
    F TBNF TUSVDUVSF BT UIF UPUBM XBZT FYQSFTTJPO × + × = "OE
    NF SFTVMU . ×

    + . ×

    = / 4JNJMBSMZ GPS UIF QSPCBCJMJUZ PG
    1S(MBOE|Q, Y) = ( − Q)( − Y) + QY
    VTU EFSJWFE IFSF JO BO JOGPSNBM XBZ JT UIF SVMF PG QSPCBCJMJUZ UIFPSZ UIBU
    PU IBQQFO UPHFUIFS BMUFSOBUJWFT
    BSF BEEFE XIFSF FWFOUT UIBU IBQQFO UP
    MJFE ćF USVF TUBUF DBO CF 8 PS - *U DBOOPU CF CPUI BU UIF TBNF UJNF
    T DBO ĕOE UIF EPPS PVU PO UIFJS PXO
    4P XF FOE VQ BEEJOH UIF XBZT UP TFF
    BMMZ 8 UP UIF XBZT UP TFF 8 XIFO JU JT BDUVBMMZ - 8F EJEOU XPSSZ BCPVU
    QPSUJPO PG XBUFS PO UIF HMPCF BOE Y UIF DIBODF PG NJTDMBTTJĕDBUJPO ćJT
    F TBNF TUSVDUVSF BT UIF UPUBM XBZT FYQSFTTJPO × + × = "OE
    NF SFTVMU . ×

    + . ×

    = / 4JNJMBSMZ GPS UIF QSPCBCJMJUZ PG
    1S(MBOE|Q, Y) = ( − Q)( − Y) + QY
    TU EFSJWFE IFSF JO BO JOGPSNBM XBZ JT UIF SVMF PG QSPCBCJMJUZ UIFPSZ UIBU
    U IBQQFO UPHFUIFS BMUFSOBUJWFT
    BSF BEEFE XIFSF FWFOUT UIBU IBQQFO UP
    JFE ćF USVF TUBUF DBO CF 8 PS - *U DBOOPU CF CPUI BU UIF TBNF UJNF
    DBO ĕOE UIF EPPS PVU PO UIFJS PXO
    4P XF FOE VQ BEEJOH UIF XBZT UP TFF
    BMMZ 8 UP UIF XBZT UP TFF 8 XIFO JU JT BDUVBMMZ - 8F EJEOU XPSSZ BCPVU
    SF XIFO XF PSJHJOBMMZ TUBSUJOH DPVOUJOH HBSEFO QBUIT CFDBVTF UIFSF XFSF
    8F XFSF OFWFS XPOEFSJOH XIBU IBQQFOFE JO UIF TBNQMJOH 8F LOFX XIBU
    F IBWF BO PCTFSWBUJPO UIBU JT DPOTJTUFOU XJUI EJČFSFOU USVF FWFOUT
    0.0 0.2 0.4 0.6 0.8 1.0
    0.0 0.5 1
    proportion of water
    DPVOUT GPS NJTDMBTTJĕDBUJPO ćF CMBDL DVSW
    JT PVS QSFWJPVT QPTUFSJPS XIJDI JHOPSFT NJT
    DMBTTJĕDBUJPO
    4P XIBU JT PVS #BZFTJBO FTUJNBUPS OPX 'PS PVS PSJHJOBM TBNQMF XJUI 8 = BOE - =
    VNJOH NJTDMBTTJĕDBUJPO BU B SBUF PG Y UIF OFX QPTUFSJPS EJTUSJCVUJPO MPPLT MJLF UIJT
    1S(Q|8, -) =
    [Q( − Y) + ( − Q)Y]8 × [( − Q)( − Y) + QY]-
    ;
    FSF BT BMXBZT UIF EFOPNJOBUPS ; JT KVTU UIF TVN PG FWFSZ OVNFSBUPS GPS FWFSZ WBMVF P
    *U KVTU OPSNBMJ[FT UIF DPVOUT TP UIFZ TVN UP POF BOE BSF QSPQFS QSPCBCJMJUJFT #VU UI
    Pr(p|W,L,x)

    View Slide

  116. 4P XIBU JT PVS #BZFTJBO FTUJNBUPS OPX 'PS PVS PSJHJOBM TBNQMF XJUI 8 = BOE - =
    VNJOH NJTDMBTTJĕDBUJPO BU B SBUF PG Y UIF OFX QPTUFSJPS EJTUSJCVUJPO MPPLT MJLF UIJT
    1S(Q|8, -) =
    [Q( − Y) + ( − Q)Y]8 × [( − Q)( − Y) + QY]-
    ;
    FSF BT BMXBZT UIF EFOPNJOBUPS ; JT KVTU UIF TVN PG FWFSZ OVNFSBUPS GPS FWFSZ WBMVF P
    *U KVTU OPSNBMJ[FT UIF DPVOUT TP UIFZ TVN UP POF BOE BSF QSPQFS QSPCBCJMJUJFT #VU UI
    ODFQUVBM IFBSU JT UIF OVNFSBUPS "OE JU KVTU DPVOUJOH BMM UIF XBZT UP TFF B TBNQMF XJUI 8
    UFS BOE - MBOE BTTVNJOH NJTDMBTTJĕDBUJPO QSPCBCJMJUZ Y ćF OPSNBMJ[JOH DPOTUBOU ; JT B
    XBZT KVTU B OVJTBODF CVU JG ZPV BSF DVSJPVT TFF UIF 0WFSUIJOLJOH CPY GVSUIFS EPXO
    -FUT QMPU PVS OFX QPTUFSJPS EJTUSJCVUJPO BOE DPNQBSF JU UP UIF QSFWJPVT POF
    code for the normalizing constant
    eta <- function(x,a,b) exp( pbeta(x,a,b,log.p=TRUE) + lbeta(a,b) )
    Pr(p|W,L,x)
    probability of each water probability of each land

    View Slide

  117. 4P XIBU JT PVS #BZFTJBO FTUJNBUPS OPX 'PS PVS PSJHJOBM TBNQMF XJUI 8 = BOE - =
    VNJOH NJTDMBTTJĕDBUJPO BU B SBUF PG Y UIF OFX QPTUFSJPS EJTUSJCVUJPO MPPLT MJLF UIJT
    1S(Q|8, -) =
    [Q( − Y) + ( − Q)Y]8 × [( − Q)( − Y) + QY]-
    ;
    FSF BT BMXBZT UIF EFOPNJOBUPS ; JT KVTU UIF TVN PG FWFSZ OVNFSBUPS GPS FWFSZ WBMVF P
    *U KVTU OPSNBMJ[FT UIF DPVOUT TP UIFZ TVN UP POF BOE BSF QSPQFS QSPCBCJMJUJFT #VU UI
    ODFQUVBM IFBSU JT UIF OVNFSBUPS "OE JU KVTU DPVOUJOH BMM UIF XBZT UP TFF B TBNQMF XJUI 8
    UFS BOE - MBOE BTTVNJOH NJTDMBTTJĕDBUJPO QSPCBCJMJUZ Y ćF OPSNBMJ[JOH DPOTUBOU ; JT B
    XBZT KVTU B OVJTBODF CVU JG ZPV BSF DVSJPVT TFF UIF 0WFSUIJOLJOH CPY GVSUIFS EPXO
    -FUT QMPU PVS OFX QPTUFSJPS EJTUSJCVUJPO BOE DPNQBSF JU UP UIF QSFWJPVT POF
    code for the normalizing constant
    eta <- function(x,a,b) exp( pbeta(x,a,b,log.p=TRUE) + lbeta(a,b) )
    Pr(p|W,L,x)
    probability of each water probability of each land
    some unpleasant
    normalizing constant

    View Slide

  118. Misclassi cation posterior
    ę
    4."-- 803-%4 "/% -"3(& 803-%4
    0.0 0.2 0.4 0.6 0.8 1.0
    0.0 0.5 1.0 1.5 2.0 2.5
    proportion of water
    posterior probability
    'ĶĴłĿIJ ƊƐ ćF QPTUFSJPS
    HMPCF UPTTJOH FYQFSJNFOU
    UJPO ćF SFE DVSWF JT UI
    DPVOUT GPS NJTDMBTTJĕDBUJP
    JT PVS QSFWJPVT QPTUFSJPS
    DMBTTJĕDBUJPO
    previous
    posterior
    misclassi cation
    posterior

    View Slide

  119. Measurement matters
    When there is measurement error, better to model it than to
    ignore it
    Same goes for: missing data, compliance, inclusion, etc
    Good news: Samples do not need to be representative of
    population in order to provide good estimates of population
    What matters is why the sample di ers

    View Slide

  120. View Slide