$30 off During Our Annual Pro Sale. View Details »

Bayesian Inference is Just Counting

Bayesian Inference is Just Counting

Conceptual introduction to Bayesian inference and data analysis, with a little causal inference at the end

Richard McElreath

February 05, 2020
Tweet

More Decks by Richard McElreath

Other Decks in Education

Transcript

  1. BAYESIAN INFERENCE
    IS JUST COUNTING
    Richard McElreath
    MPI-EVA
    p(x|y)p(y)/p(x)

    View Slide

  2. View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. The Golem of Prague
    go•lem |gōlǝm|
    noun
    • (in Jewish legend) a clay figure
    brought to life by magic. 

    • an automaton or robot.
    ORIGIN late 19th cent.: from Yiddish
    goylem, from Hebrew gōlem
    ‘shapeless mass.’

    View Slide

  7. The Golem of Prague
    “Even the most perfect of Golem, risen to
    life to protect us, can easily change into a
    destructive force. Therefore let us treat
    carefully that which is strong, just as we
    bow kindly and patiently to that which is
    weak.”
    Rabbi Judah Loew ben
    Bezalel (1512–1609)
    From Breath of Bones: A Tale of the Golem

    View Slide

  8. The Golems of Science
    Golem
    • Made of clay
    • Animated by “truth”
    • Powerful
    • Blind to creator’s intent
    • Easy to misuse
    • Fictional
    Model
    • Made of...silicon?
    • Animated by “truth”
    • Hopefully powerful
    • Blind to creator’s intent
    • Easy to misuse
    • Not even false

    View Slide

  9. Bayesian data analysis
    • Use probability to describe uncertainty
    • Extends ordinary logic (true/false) to
    continuous plausibility
    • Computationally difficult
    • Markov chain Monte Carlo (MCMC) to
    the rescue
    • Used to be controversial
    • Ronald Fisher: Bayesian analysis “must be
    wholly rejected.”
    Pierre-Simon Laplace (1749–1827)
    Sir Harold Jeffreys (1891–1989)
    with Bertha Swirles, aka Lady
    Jeffreys (1903–1999)

    View Slide

  10. Bayesian data analysis
    Count all the ways data can happen,
    according to assumptions.
    Assumptions with more ways that are
    consistent with data are more
    plausible.

    View Slide

  11. Bayesian data analysis
    • Contrast with frequentist view
    • Probability is just limiting
    frequency
    • Uncertainty arises from sampling
    variation
    • Bayesian probability much
    more general
    • Probability is in the golem, not in
    the world
    • Coins are not random, but our
    ignorance makes them so
    Saturn as Galileo saw it

    View Slide

  12. Garden of Forking Data
    • The future:
    • Full of branching paths
    • Each choice closes some
    • The data:
    • Many possible events
    • Each observation eliminates some

    View Slide

  13. Garden of Forking Data
    (1)
    (2)
    (3)
    (4)
    (5)
    Contains 4 marbles
    ?
    Possible contents:
    Observe:

    View Slide

  14. Conjecture: Data:

    View Slide

  15. Conjecture: Data:

    View Slide

  16. Conjecture: Data:

    View Slide

  17. Conjecture: Data:
    3 paths consistent with data

    View Slide

  18. Garden of Forking Data
    (1)
    (2)
    (3)
    (4)
    (5)
    Possible contents: Ways to produce
    ?
    3
    ?
    ?
    ?

    View Slide

  19. Garden of Forking Data
    (1)
    (2)
    (3)
    (4)
    (5)
    Possible contents: Ways to produce
    0
    3
    ?
    ?
    0

    View Slide

  20. 3 ways
    9 ways
    8 ways

    View Slide

  21. 3 ways
    9 ways
    8 ways

    View Slide

  22. 3 ways
    9 ways
    8 ways

    View Slide

  23. Garden of Forking Data
    OE XIJUF UIFSF BSF QBUIT UIBU TVSWJWF
    WF DPOTJEFSFE ĕWF EJČFSFOU DPOKFDUVSFT BCPVU UIF DPOUFOUT PG UIF CBH
    F NBSCMFT UP GPVS CMVF NBSCMFT 'PS FBDI PG UIFTF DPOKFDUVSFT XFWF
    TFRVFODFT QBUIT UISPVHI UIF HBSEFO PG GPSLJOH EBUB DPVME QPUFOUJBMMZ
    EBUB
    $POKFDUVSF 8BZT UP QSPEVDF
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    < > × × =
    S PG XBZT UP QSPEVDF UIF EBUB GPS FBDI DPOKFDUVSF DBO CF DPNQVUFE
    VNCFS PG QBUIT JO FBDI iSJOHw PG UIF HBSEFO BOE UIFO CZ NVMUJQMZJOH
    ćJT JT KVTU B DPNQVUBUJPOBM EFWJDF *U UFMMT VT UIF TBNF UIJOH BT 'ĶĴ
    BWJOH UP ESBX UIF HBSEFO ćF GBDU UIBU OVNCFST BSF NVMUJQMJFE EVSJOH
    OHF UIF GBDU UIBU UIJT JT TUJMM KVTU DPVOUJOH PG MPHJDBMMZ QPTTJCMF QBUIT

    View Slide

  24. Updating
    Another draw from the bag:
    QBUIT DPNQBUJCMF XJUI UIF EBUB TFRVFODF 0S ZPV DPVME UBLF UIF Q
    PWFS DPOKFDUVSFT
    BOE KVTU VQEBUF UIFN JO MJHIU PG UIF OFX PCTFS
    PVU UIBU UIFTF UXP NFUIPET BSF NBUIFNBUJDBMMZ JEFOUJDBM "T MPOH BT UIF OFX
    JT MPHJDBMMZ JOEFQFOEFOU PG UIF QSFWJPVT PCTFSWBUJPOT
    SFT IPX UP EP JU 'JSTU XF DPVOU UIF OVNCFST PG XBZT FBDI DPOKFDUVSF DPVME Q
    X PCTFSWBUJPO ćFO XF NVMUJQMZ FBDI PG UIFTF OFX DPVOUT CZ UIF QSFWJPVT OV
    GPS FBDI DPOKFDUVSF *O UBCMF GPSN
    $POKFDUVSF 8BZT UP QSPEVDF 1SFWJPVT DPVOUT /FX DPVOU
    < > × =
    < > × =
    < > × =
    < > × =
    < > × =
    X DPVOUT JO UIF SJHIUIBOE DPMVNO BCPWF TVNNBSJ[F BMM UIF FWJEFODF GPS FBDI
    T OFX EBUB BSSJWF BOE QSPWJEFE UIPTF EBUB BSF JOEFQFOEFOU PG QSFWJPVT PCTFSW
    F OVNCFS PG MPHJDBMMZ QPTTJCMF XBZT GPS B DPOKFDUVSF UP QSPEVDF BMM UIF EBUB VQ
    BO CF DPNQVUFE KVTU CZ NVMUJQMZJOH UIF OFX DPVOU CZ UIF PME DPVOU
    T VQEBUJOH BQQSPBDI BNPVOUT UP OPUIJOH NPSF UIBO BTTFSUJOH UIBU
    XIFO X
    VT JOGPSNBUJPO TVHHFTUJOH UIFSF BSF 8QSJPS
    XBZT GPS B DPOKFDUVSF UP QSPEVDF B Q
    4

    View Slide

  25. Using other information
    marbles rare, but every bag contains at least one.
    Factory says:
    IJT FYBNQMF UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBX
    #VU JO HFOFSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQ
    UIBU TPNFPOF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF
    H DPOUBJOJOH < > UIFZ NBEF CBHT DPOUBJOJOH < > BOE CBHT
    > ćFZ BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE PO
    8F DBO VQEBUF PVS DPVOUT BHBJO
    $POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU
    < > × =
    < > × =
    < > × =
    < > × =
    < > × =
    DPOKFDUVSF < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > *
    E EJČFSFODF JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPO
    SSFDU POF :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO
    OH 0SJHJOBM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSF
    UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBXO GSPN
    FSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQQPTF GPS
    POF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF 4P GPS
    H < > UIFZ NBEF CBHT DPOUBJOJOH < > BOE CBHT DPOUBJO
    Z BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE POF XIJUF
    EBUF PVS DPVOUT BHBJO
    $POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU
    > × =
    > × =
    > × =
    > × =
    > × =
    < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > *T UIFSF B
    F JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPOKFDUVSFT
    :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO
    BM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSFWJPVT JO

    View Slide

  26. Using other information
    marbles rare.
    Factory says:
    IJT FYBNQMF UIF QSJPS EBUB BOE OFX EBUB BSF PG UIF TBNF UZQF NBSCMFT ESBX
    #VU JO HFOFSBM UIF QSJPS EBUB BOE OFX EBUB DBO CF PG EJČFSFOU UZQFT 4VQ
    UIBU TPNFPOF GSPN UIF NBSCMF GBDUPSZ UFMMT ZPV UIBU CMVF NBSCMFT BSF SBSF
    H DPOUBJOJOH < > UIFZ NBEF CBHT DPOUBJOJOH < > BOE CBHT
    > ćFZ BMTP FOTVSFE UIBU FWFSZ CBH DPOUBJOFE BU MFBTU POF CMVF BOE PO
    8F DBO VQEBUF PVS DPVOUT BHBJO
    $POKFDUVSF 1SJPS XBZT 'BDUPSZ DPVOU /FX DPVOU
    < > × =
    < > × =
    < > × =
    < > × =
    < > × =
    DPOKFDUVSF < > JT NPTU QMBVTJCMF CVU CBSFMZ CFUUFS UIBO < > *
    E EJČFSFODF JO UIFTF DPVOUT BU XIJDI XF DBO TBGFMZ EFDJEF UIBU POF PG UIF DPO
    SSFDU POF :PVMM TQFOE UIF OFYU DIBQUFS FYQMPSJOH UIBU RVFTUJPO
    OH 0SJHJOBM JHOPSBODF 8IJDI BTTVNQUJPO TIPVME XF VTF XIFO UIFSF JT OP QSF

    View Slide

  27. Counts to plausibility
    Unglamorous basis of applied probability:
    Things that can happen more ways are more plausible.
    J[F JT UP BEE VQ BMM PG UIF QSPEVDUT POF GPS FBDI WBMVF Q DBO UBLF
    XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q
    UIFO EJWJEF FBDI QSPEVDU CZ UIF TVN PG QSPEVDUT
    QMBVTJCJMJUZ PG Q BęFS %OFX =
    XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q
    TVN PG QSPEVDUT
    FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH
    CFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU
    $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ
    UZw
    1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ
    < >
    < > .
    < > .
    < > .
    < >
    DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3
    ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ

    View Slide

  28. Counts to plausibility
    TVN PG QSPEVDUT
    FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH
    CFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU
    $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ
    UZw
    1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ
    < >
    < > .
    < > .
    < > .
    < >
    DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3
    ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ
    dz.0(ǭ24.Ǯ
    ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ
    ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFT‰UIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF

    CFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJ
    EBSEJ[F JT UP BEE VQ BMM PG UIF QSPEVDUT POF GPS FBDI WBMVF Q DBO UBLF
    XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q
    "OE UIFO EJWJEF FBDI QSPEVDU CZ UIF TVN PG QSPEVDUT
    QMBVTJCJMJUZ PG Q BęFS %OFX =
    XBZT Q DBO QSPEVDF %OFX × QSJPS QMBVTJCJMJUZ Q
    TVN PG QSPEVDUT
    ćFSFT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH UIF
    OVNCFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU
    $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQMBV
    TJCJMJUZw
    1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ
    < >
    < > .
    < > .
    < > .
    < >
    :PV DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3
    3 DPEF

    24. ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ
    24.dz.0(ǭ24.Ǯ
    ǯƼǰ ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ
    ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFT‰UIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF
    SFBM
    OVNCFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJMJUJFT
    ZPV DBO BMTP EP XJUI UIFTF WBMVFT 4QFDJĕDBMMZ FBDI QJFDF PG UIF DBMDVMBUJPO IBT B EJSFDU
    QBSUOFS JO BQQMJFE QSPCBCJMJUZ UIFPSZ ćFTF QBSUOFST IBWF TUFSFPUZQFE OBNFT TP JUT XPSUI

    View Slide

  29. Counts to plausibility
    TVN PG QSPEVDUT
    FT OPUIJOH TQFDJBM SFBMMZ BCPVU TUBOEBSEJ[JOH UP POF "OZ WBMVF XJMM EP #VU VTJOH
    CFS FOET VQ NBLJOH UIF NBUIFNBUJDT NPSF DPOWFOJFOU
    $POTJEFS BHBJO UIF UBCMF GSPN CFGPSF OPX VQEBUFE VTJOH PVS EFĕOJUJPOT PG Q BOE iQ
    UZw
    1PTTJCMF DPNQPTJUJPO Q XBZT UP QSPEVDF EBUB QMBVTJCJMJUZ
    < >
    < > .
    < > .
    < > .
    < >
    DBO RVJDLMZ DPNQVUF UIFTF QMBVTJCJMJUJFT JO 3
    ʄǤ ǭ ƾ ǐ ǃ ǐ DŽ Ǯ
    dz.0(ǭ24.Ǯ
    ƻǏƼǀ ƻǏƿƻ ƻǏƿǀ
    ćFTF QMBVTJCJMJUJFT BSF BMTP QSPCBCJMJUJFT‰UIFZ BSF OPOOFHBUJWF [FSP PS QPTJUJWF

    CFST UIBU TVN UP POF "OE BMM PG UIF NBUIFNBUJDBM UIJOHT ZPV DBO EP XJUI QSPCBCJ
    Plausibility is probability: Set of non-negative real
    numbers that sum to one.
    Probability theory is just a set of shortcuts for
    counting possibilities.

    View Slide

  30. Building a model
    • How to use probability to do typical statistical
    modeling?
    1. Design the model (data story)
    2. Condition on the data (update)
    3. Evaluate the model (critique)

    View Slide

  31. Nine tosses of the globe:
    W L W W W L W L W

    View Slide

  32. Design > Condition > Evaluate
    • Data story motivates the model
    • How do the data arise?
    • For W L W W W L W L W:
    • Some true proportion of water, p
    • Toss globe, probability p of observing W, 1–p of L
    • Each toss therefore independent of other tosses
    • Translate data story into probability statements

    View Slide

  33. Design > Condition > Evaluate
    • Bayesian updating defines optimal learning in small
    world, converts prior into posterior
    • Give your golem an information state, before the data:
    Here, an initial confidence in each possible value of p
    between zero and one
    • Condition on data to update information state: New
    confidence in each value of p, conditional on data

    View Slide

  34. probability of water
    0 0.5 1
    n = 1
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 2
    W L W W W L W L W W
    n = 4
    W L W W W L W L W
    confidence
    n = 5
    W L W W W L W L W W
    prior
    p, proportion W
    plausibility

    View Slide

  35. probability of water
    0 0.5 1
    n = 1
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 2
    W L W W W L W L W W
    n = 4
    W L W W W L W L W
    confidence
    n = 5
    W L W W W L W L W W
    prior
    posterior
    p, proportion W
    plausibility

    View Slide

  36. probability of water
    0 0.5 1
    n = 1
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 2
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 3
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 4
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 5
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 6
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 7
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 8
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 9
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W

    View Slide

  37. probability of water
    0 0.5 1
    n = 1
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 2
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 3
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 4
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 5
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 6
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 7
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 8
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 9
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W

    View Slide

  38. probability of water
    0 0.5 1
    n = 1
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 2
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 3
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 4
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 5
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 6
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 7
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 8
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 9
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W

    View Slide

  39. probability of water
    0 0.5 1
    n = 1
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 2
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 3
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 4
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 5
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 6
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 7
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 8
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 9
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W

    View Slide

  40. probability of water
    0 0.5 1
    n = 1
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 2
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 3
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 4
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 5
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 6
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 7
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 8
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 9
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W

    View Slide

  41. Design > Condition > Evaluate
    • Data order irrelevant, because
    golem assumes order irrelevant
    • All-at-once, one-at-a-time,
    shuffled order all give same
    posterior
    • Every posterior is a prior for
    next observation
    • Every prior is posterior of
    some other inference
    • Sample size automatically
    embodied in posterior
    4."-- 803-%4 "/% -"3(& 803-%4
    probability of water
    0 0.5 1
    n = 1
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 2
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 3
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 4
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 5
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 6
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 7
    W L W W W L W L W
    confidence
    probability of water
    0 0.5 1
    n = 8
    W L W W W L W L W
    probability of water
    0 0.5 1
    n = 9
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    proportion water
    0 0.5 1
    plausibility
    n = 0
    W L W W W L W L W
    'ĶĴłĿIJ Ɗƍ )PX B #BZFTJBO NPEFM MFBSOT &BDI UPTT PG UIF HMPCF QSPEVDFT
    BO PCTFSWBUJPO PG XBUFS 8
    PS MBOE -
    ćF NPEFMT FTUJNBUF PG UIF QSP
    QPSUJPO PG XBUFS PO UIF HMPCF JT B QMBVTJCJMJUZ GPS FWFSZ QPTTJCMF WBMVF ćF
    MJOFT BOE DVSWFT JO UIJT ĕHVSF BSF UIFTF DPMMFDUJPOT PG QMBVTJCJMJUJFT *O FBDI
    QMPU B QSFWJPVT QMBVTJCJMJUJFT EBTIFE DVSWF
    BSF VQEBUFE JO MJHIU PG UIF MBUFTU

    View Slide

  42. Design > Condition > Evaluate
    • Bayesian inference: Logical answer to
    a question in the form of a model


    “How plausible is each proportion of
    water, given these data?”

    • Golem must be supervised
    • Did the golem malfunction?
    • Does the golem’s answer make sense?
    • Does the question make sense?
    • Check sensitivity of answer to changes in
    assumptions

    View Slide

  43. Construction perspective
    • Build joint model:
    (1) List variables
    (2) Define generative relations
    (3) ???
    (4) Profit
    • Input: Joint prior
    • Deduce: Joint posterior

    View Slide

  44. The Joint Model
    σ
    α
    ρσα
    σβ
    ρσα
    σβ
    σ
    β
    8 ∼ #JOPNJBM(/, Q)
    Q ∼ 6OJGPSN(, )
    • Bayesian models are generative
    • Can be run forward to generate predictions or
    simulate date
    • Can be run in reverse to infer process from data

    View Slide

  45. The Joint Model
    σ
    α
    ρσα
    σβ
    ρσα
    σβ
    σ
    β
    8 ∼ #JOPNJBM(/, Q)
    Q ∼ 6OJGPSN(, )
    • Run forward:

    View Slide

  46. Run in Reverse: 

    Computing the posterior
    1. Analytical approach (often impossible)
    2. Grid approximation (very intensive)
    3. Quadratic approximation (limited)
    4. Markov chain Monte Carlo (intensive)

    View Slide

  47. Predictive checks
    • Something like a significance test, but not
    • No universally best way to evaluate
    adequacy of model-based predictions
    • No way to justify always using a
    threshold like 5%
    • Good predictive checks always depend
    upon purpose and imagination
    “It would be very nice to
    have a formal apparatus that
    gives us some ‘optimal’ way
    of recognizing unusual
    phenomena and inventing
    new classes of hypotheses
    [...]; but this remains an art
    for the creative human
    mind.” 

    —E.T. Jaynes (1922–1998)

    View Slide

  48. View Slide

  49. Triumph of Geocentrism
    • Claudius Ptolemy (90–168)
    • Egyptian mathematician
    • Accurate model of planetary
    motion
    • Epicycles: orbits on orbits
    • Fourier series
    -*/&"
    Earth
    equant
    planet
    epicycle
    deferent

    View Slide

  50. Geocentrism
    • Descriptively accurate
    • Mechanistically wrong
    • General method of
    approximation
    • Known to be wrong
    Regression
    • Descriptively accurate
    • Mechanistically wrong
    • General method of
    approximation
    • Taken too seriously

    View Slide

  51. Linear regression
    • Simple statistical golems
    • Model of mean and variance of normally (Gaussian)
    distributed measure
    • Mean as additive combination of weighted variables
    • Constant variance

    View Slide

  52. 1809 Bayesian argument for normal error and least-squares estimation

    View Slide

  53. Why normal?
    • Why are normal (Gaussian)
    distributions so common in
    statistics?
    1. Easy to calculate with
    2. Common in nature
    3. Very conservative assumption
    0.0 0.1 0.2 0.3 0.4
    x
    density
    −4σ −2σ 0 2σ 4σ
    95%

    View Slide

  54. View Slide

  55. View Slide

  56. View Slide

  57. View Slide

  58. View Slide

  59. -6 -3 0 3 6
    0.00 0.10 0.20
    position
    Density
    16 steps
    -6 -3 0 3 6
    0.0 0.1 0.2 0.3
    position
    Density
    4 steps
    -6 -3 0 3 6
    0.00 0.10 0.20
    position
    Density
    8 steps
    0 4 8 12 16
    -6 -3 0 3 6
    step number
    position
    'ĶĴłĿIJ ƌƊ 3BOEPN XBMLT PO UIF TPDDFS ĕFME DPOWFSHF UP B OPSNBM EJT
    USJCVUJPO ćF NPSF TUFQT BSF UBLFO UIF DMPTFS UIF NBUDI CFUXFFO UIF SFBM
    Figure 4.2

    View Slide

  60. -6 -3 0 3 6
    0.00 0.10 0.20
    position
    Density
    16 steps
    -6 -3 0 3 6
    0.0 0.1 0.2 0.3
    position
    Density
    4 steps
    -6 -3 0 3 6
    0.00 0.10 0.20
    position
    Density
    8 steps
    0 4 8 12 16
    -6 -3 0 3 6
    step number
    position
    'ĶĴłĿIJ ƌƊ 3BOEPN XBMLT PO UIF TPDDFS ĕFME DPOWFSHF UP B OPSNBM EJT
    USJCVUJPO ćF NPSF TUFQT BSF UBLFO UIF DMPTFS UIF NBUDI CFUXFFO UIF SFBM
    Figure 4.2

    View Slide

  61. -6 -3 0 3 6
    0.00 0.10 0.20
    position
    Density
    16 steps
    -6 -3 0 3 6
    0.0 0.1 0.2 0.3
    position
    Density
    4 steps
    -6 -3 0 3 6
    0.00 0.10 0.20
    position
    Density
    8 steps
    0 4 8 12 16
    -6 -3 0 3 6
    step number
    position
    'ĶĴłĿIJ ƌƊ 3BOEPN XBMLT PO UIF TPDDFS ĕFME DPOWFSHF UP B OPSNBM EJT
    USJCVUJPO ćF NPSF TUFQT BSF UBLFO UIF DMPTFS UIF NBUDI CFUXFFO UIF SFBM
    Figure 4.2

    View Slide

  62. Why normal?
    • Processes that produce
    normal distributions
    • Addition
    • Products of small deviations
    • Logarithms of products
    Francis Galton’s 1894 “bean machine”
    for simulating normal distributions

    View Slide

  63. View Slide

  64. Why normal?
    • Ontological perspective
    • Processes which add fluctuations result
    in dampening
    • Damped fluctuations end up Gaussian
    • No information left, except mean and
    variance
    • Can’t infer process from distribution!
    • Epistemological perspective
    • Know only mean and variance
    • Then least surprising and most
    conservative (maximum entropy)
    distribution is Gaussian
    • Nature likes maximum entropy
    distributions

    View Slide

  65. Why normal?
    • Ontological perspective
    • Processes which add fluctuations result
    in dampening
    • Damped fluctuations end up Gaussian
    • No information left, except mean and
    variance
    • Can’t infer process from distribution!
    • Epistemological perspective
    • Know only mean and variance
    • Then least surprising and most
    conservative (maximum entropy)
    distribution is Gaussian
    • Nature likes maximum entropy
    distributions

    View Slide

  66. Linear models
    • Models of normally distributed data
    common
    • “General Linear Model”: t-test, single
    regression, multiple regression,
    ANOVA, ANCOVA, MANOVA,
    MANCOVA, yadda yadda yadda
    • All the same thing
    • Learn strategy, not procedure
    Willard Boepple

    View Slide

  67. 30 35 40 45 50 55 60
    140 150 160 170 180
    weight
    height
    N = 10
    30 35 40 45 50 55 60
    140 150 160 170 180
    weight
    height
    N = 20
    30 35 40 45 50 55 60
    140 150 160 170 180
    weight
    height
    N = 50
    30 35 40 45 50 55 60
    140 150 160 170 180
    weight
    height
    N = 100
    30 35 40 45 50 55 60
    140 150 160 170 180
    weight
    height
    N = 200
    30 35 40 45 50 55 60
    140 150 160 170 180
    weight
    height
    N = 350

    View Slide

  68. View Slide

  69. Regression as a
    wicked oracle
    • Regression automatically focuses
    on the most informative cases
    • Cases that don’t help are
    automatically ignored
    • But not kind — ask carefully

    View Slide

  70. Why not just add everything?
    • Could just add all available
    predictors to model
    • “We controlled for...”
    • Almost always a bad idea
    • Adding variables creates confounds
    • Residual confounding
    • Overfitting

    View Slide

  71. AGE
    HEIGHT MATH
    ?
    MATH independent of HEIGHT, conditional on AGE

    View Slide

  72. M
    5 6 7 8 9 10
    50 55 60 65 70 75 80
    5 6 7 8 9 10
    A
    50 55 60 65 70 75 80
    110 120 130
    110 120 130
    H

    View Slide

  73. MATH independent of HEIGHT, conditional on AGE
    118 119 120 121 122
    55 60 65 70 75
    H
    M
    A = 7
    123 124 125 126 127
    60 65 70 75
    H
    M
    A = 8
    128 129 130 131 132
    60 65 70 75 80
    H
    M
    A = 9
    133 134 135 136 137
    65 70 75 80
    H
    M
    A = 10

    View Slide

  74. LIGHT
    POWER SWITCH
    SWITCH dependent on POWER, conditional on LIGHT
    SWITCH independent of POWER

    View Slide

  75. LIGHT POWER SWITCH
    ON ON ?
    OFF ON ?
    SWITCH dependent on POWER, conditional on LIGHT
    This effect known as “collider bias”

    View Slide

  76. MARRIED
    AGE HAPPY
    HAPPY dependent on AGE, conditional on MARRIED
    HAPPY independent of AGE

    View Slide

  77. Why not just add everything?
    • Matters for experiments as well
    • Conditioning on post-treatment
    variables can be very bad
    • Conditioning on pre-treatment can
    also be bad (colliders)
    • Good news!
    • Causal inference possible in
    observational settings
    • But requires good theory

    View Slide

  78. Texts in Statistical Science
    Richard McElreath
    Statistical Rethinking
    A Bayesian Course
    with Examples in R and Stan
    SECOND EDITION
    ond
    on
    JUST COUNTING IMPLICATIONS OF ASSUMPTIONS

    View Slide