BeginnerSession1_70th_TokyoR

8284465a94bbdf1ea82cf1a67d55f447?s=47 kilometer
June 09, 2018
1.4k

 BeginnerSession1_70th_TokyoR

ベイズ統計に関するチュートリアル資料です。

8284465a94bbdf1ea82cf1a67d55f447?s=128

kilometer

June 09, 2018
Tweet

Transcript

  1. 70th Tokyo.R @kilometer BeginneR Session 1 -- Bayesian Modeling --

    2018.06.09 at Microsoft Co.
  2. Who!?

  3. Who!? 名前: 三村 @kilometer 職業: ポスドク (こうがくはくし) 専⾨: ⾏動神経科学(霊⻑類) 脳イメージング

    医療システム⼯学 R歴: ~ 10年ぐらい 流⾏: ガジュマル
  4. BeginneR Session

  5. BeginneR

  6. BeginneR

  7. Before After BeginneR Session BeginneR BeginneR

  8. BeginneR Advanced Hoxo_m If I have seen further it is

    by standing on the sholders of Giants. -- Sir Isaac Newton, 1676
  9. BeginneR Session 1 -- Bayesian Modeling --

  10. What is modeling? Welcome to Bayesian statistics Agenda

  11. What is modeling?

  12. What is modeling? ℎ f X ℎℎ Truth Knowledge

  13. What is modeling? ℎ f X ℎℎ Truth Knowledge Narrow

    sense Broad sense
  14. “Strong” Hypothesis “Weaken” Hypothesis Data Data What is modeling? Hypothesis

    Driven Data Driven
  15. What is modeling? f X ℎℎ . f X ℎℎ

    . ℎ ℎ Hypothesis Driven Data Driven
  16. What is modeling? A/B test Hypothesis Driven やったこと ないけどね! or

    A B HA : A is better HB : B is better H0 : We have to choice better 1 of 2 Strong hypothesis A B * Simple data
  17. What is modeling? Meta Analysis H0 : There are best/better

    way Weaken hypothesis Complex data みんなこれの事を なんて呼ぶの? Data Driven
  18. What is modeling? Data Driven Analysis Hypothesis Driven Analysis How

    to do? What to do? Decision Making Weaken Hypothesis Strong Hypothesis Simple Data Complex Data
  19. What is modeling? Data Driven Hypothesis Driven How to do?

    What to do? Decision Making Weaken Hypothesis Strong Hypothesis Simple Data Complex Data Simple Model Complex Model
  20. What is modeling? Data Driven Hypothesis Driven How to do?

    What to do? Decision Making Weaken Hypothesis Strong Hypothesis Simple Data Complex Data Simple Model Complex Model Narrow sense Broad sense
  21. What is modeling? ℎ f X ℎℎ Truth Knowledge Narrow

    sense Broad sense
  22. or A B HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2 A B * A is better
  23. There is only one difference between a madman and me.

    The madman thinks he is sane. I know I am mad. Dalí is a dilly. 1956 , The American Magazine, 162(1), 28–9, 107–9. -- Salvador Dalí
  24. or A B HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. A B There is a difference between A and B A>B A is better d H1 :
  25. Welcome to Bayesian statistics

  26. Dice with α faces (regular polyhedron) ℎ … Truth Knowledge

    ? Hypothesis Observation = 5
  27. ( = 5| = 4) = 0 Dice with faces

    = 5 ( = 5| = 6) = 1 6 ( = 5| = 8) = 1 8 ( = 5| = 12) = 1 12 ( = 5| = 20) = 1 20 likelihood maximum likelihood
  28. likelihood maximum likelihood you = {5, 4, 3, 4, 2,

    1, 2, 3, 1, 4} ( = | = 4) = 0 ( = | = 8) = 1 810 ( = | = 12) = 1 1210 ( = | = 20) = 1 2010 ( = | = 6) = 1 610 Dice with faces
  29. you Could you find α? Yes. α is estimated at

    6!! Why do you think so? Hmmmm…, well.., how many ( = 6)? Oh, it is d edf !! ….nnNNNO!!! WHAT!!???? friend Because, arg maxi {(|)} = 6!!
  30. Dice with faces ( = | = 6) = 1

    610 maximum likelihood you(before) you(after) ( = 6|)!!?? Hmmmm… Well.., how many ( = 6)? friend = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4}
  31. = 1 , … , ∞ , ∀ ∈ ℕ

    = 1 , … , p realization x <- sample(, 1) ∶= ∀ = || sample space (can NEVER get) stochastic variable probability distribution <- c(1, 1, 1, 1, 1, 2, 2, 3, 4, 5, 5) = hist(, freq = FALSE, label = TRUE) = 2 ~ ⇔ t → : number of trial
  32. ∶ → = 1 , … , ∞ , ∀

    ∈ ℕ = 1 , … , p realization sample space (can NEVER get) = = ∀ = || probability distribution g <- function( = 6) { map(1:∞, ~sample(1: , n=10, replace = TRUE)) } = <- g() X <- density() ~ x → t → ⇔ ~(|) statistical modeling outcome function of face dice
  33. probability distribution sample space | = ~ (|) ∶ →

    parameter = 1 , … , p ∈ | realization X <- map(1:∞, ~g()) x <- sample(X, 1) = 1 , … , ∞ , ∀ ∈ ℕ statistical modeling
  34. ( = | = 6) = 6 = !!?? =

    6 = 12 = 20
  35. ~z (|) : → ~ (|) : → ∈ ∈

    | ← = {1 , … , ∞ } x | ← , ∈ 4, 6, 8, 12, 20 t ← = 1 , … , ∞ x t ← , ∀ ∈ ℕ, ∀ ≤ , (|) (|) statistical modeling statistical modeling
  36. ∀ ≤ | ← = {1 , … , ∞

    } x | ← , ∈ {4, 6, 8, 12, 20} t ← = 1 , … , ∞ x t ← , ∀ ∈ ℕ, ∀ ≤ , ~(|) ~ (|)
  37. Conditional probability () () ∩ = ( ∩ )

  38. ∗ ∗ () = = ) ∗ () () ℎ

    () ≠ 0, Bayes’ theorem ∩ = ( ∩ )
  39. = ) ∗ () () ℎ () ≠ 0, ~

    (|) = ) ∗ () ~ (|) : → : →
  40. likelihood = ) ∗ () () ℎ () ≠ 0,

    = ) ∗ () ~ (|) ~ (|) : → : →
  41. = = likelihood = ) ∗ () ~ (|) ~

    (|) : → : → | ← = 1 , … , ∞ t ← = 1 , … , ∞ , ∈ 4, 6, 8, 12, 20
  42. likelihood = …{ ∗ (|) ∀i } marginalization ∈ 4,

    6, 8, 12, 20 likelihood = ) ∗ () ~ (|) ~ (|) : → : → = =
  43. likelihood = ) ∗ | ∑ { ∗ (|) ∀i

    } ~ (|) ~ (|) : → : → = ) ∗ ()
  44. likelihood maximum likelihood you = {5, 4, 3, 4, 2,

    1, 2, 3, 1, 4} ( = | = 4) = 0 ( = | = 8) = 1 810 ( = | = 12) = 1 1210 ( = | = 20) = 1 2010 ( = | = 6) = 1 610 Dice with faces
  45. likelihood = ) ∗ | ∑ { ∗ (|) ∀i

    } ~ (|) ~ (|) : → : → (|) = 1 , … , ∞ , ∀ ∈ ℕ sample space (can NEVER get)
  46. likelihood = ) ∗ | ∑ { ∗ (|) ∀i

    } (|) = 1 , … , ∞ , ∀ ∈ ℕ sample space CAN NEVER GET . ~ (|) ~ (|) : → : →
  47. ∀ | ≅ ∀ |‰ = 1 5 ∈ 4,

    6, 8, 12, 20 (|) likelihood = ) ∗ | ∑ { ∗ (|) ∀i } ~ (|) ~ (|) : → : →
  48. = ) ∑ { ∀i } ≈ ) 1.7485 −

    08 = (|) 4 + 6 + 8 + 12 + 20 likelihood ≅ ) ∗ |′ ∑ { ∗ (|′) ∀i } , ℎ ∀ |′ = 1/5 ~ (|) ~ (|) : → : →
  49. ( = | = 6) = 1 610 maximum likelihood

    you ≅ (| = 6) 1.7485 − 08 Hmmmm… Well.., how many ( = 6)? friend ≈ 94.85% = 6 = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} Dice with faces
  50. 6 ≈ 94.58% 4 = 0% 8 ≈ 5.32% 12

    ≈ 0.09% 20 ≈ 0.0005% 4 X‰ = 1 5 6 X‰ = 1 5 8 X‰ = 1 5 12 X‰ = 1 5 20 X‰ = 1 5 prior probability posterior probability MAP(Maximum a posteriori) estimation arg i {(|)}= 6
  51. = {5, 4, 3, 4, 2, 1, 2, 3, 1,

    4} Dice with faces 11 ≤ 6|6 ∗ 6 ≈ 94.58% 11 ≤ 6|4 ∗ 4 = 0% 11 ≤ 6|8 ∗ 8 ≈ 3.99% 11 ≤ 6|12 ∗ 12 ≈ 0.046% 11 ≤ 6|20 ∗ 20 ≈ 0.0001% 11 ≤ 6 ≈ 98.62% predictive probability
  52. you OK, let’s try 11!! friend (11 ≤ 6|) ≈

    98.62% And, = 6 ≈ 94.58% = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} Dice with faces
  53. you OK, let’s try 11!! friend (11 ≤ 6|) ≅

    98.88% And, = 6 ≅ 94.85% 11 = 8 = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} Dice with faces
  54. you OK, let’s try 11!! friend (11 ≤ 6|) ≈

    98.62% And, = 6 ≈ 94.58% 11 = 8 = 6 {, 11 } = 0% = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} Dice with faces
  55. = {5, 4, 3, 4, 2, 1, 2, 3, 1,

    4} Dice with faces ́ = {, 8} likelihood ≅ ) ∗ |′ () posterior prior Dice with faces likelihood ́ ≅ ́ ) ∗ |′′ (́) prior
  56. = {5, 4, 3, 4, 2, 1, 2, 3, 1,

    4} Dice with faces ́ = {, 8} likelihood ≅ ) ∗ |′ () posterior prior Dice with faces likelihood ́ ≅ ́ ) ∗ | (́) prior posterior
  57. X‰ = 20%, 20%, 20%, 20%, 20% prior posterior =

    {4, 6, 8, 12, 20} ≈ 0%, 94.58%, 5.32%, 0.09%, 0.0005% posterior ́ ≈ 0%, 0%, 99.98%, 0.020%, 0.0000004% prior = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} ́ = {, 8}
  58. you OK!!! Let’s 12 !! COME OOON friend (12 ≤

    8|́) ≈ 99.98% And, = 8 ́ ≈ 99.98% Dice with faces ́ = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4, 8}
  59. There was nobody that then know their whereabouts...

  60. likelihood posterior ≅ ) ∗ () likelihood | ”(t|i)∗”(i|z) |

    ”(||z) ”(t|i) prior distribution posterior distribution predictive distribution data prior
  61. likelihood posterior ≅ ) ∗ () predictive distribution () (|)

    Truth Information Criterion in Bayesian modeling prior likelihood | ”(t|i)∗”(i|z) | ”(||z) ”(t|i) prior distribution posterior distribution data
  62. likelihood prior posterior ≅ ) ∗ () predictive distribution ()

    (|) —˜ (| = − () Kullback–Leibler divergence Information Criterion in Bayesian modeling Truth = − log − − log = log likelihood | ”(t|i)∗”(i|z) | ”(||z) ”(t|i) prior distribution posterior distribution data expectation self-information
  63. = › ∗ log () (|) = › ∗ log

    () − › ∗ log (|) = −•(t) − › ∗ log (|) Generalization error ≔ min ” —˜ (| ⇔ min ” Entropy WAIC Information Criterion in Bayesian modeling Kullback–Leibler divergence —˜ (| = log = −[()] − › ∗ log (|)
  64. likelihood prior posterior ≅ ) ∗ () predictive distribution ()

    (|) Truth Information Criterion in Bayesian modeling —˜ (| = −• + Generalization error ≈ likelihood | ”(t|i)∗”(i|z) | ”(||z) ”(t|i) prior distribution posterior distribution data
  65. likelihood prior posterior ≅ ) ∗ () likelihood posterior distribution

    predictive distribution () (|) Truth Information Criterion in Bayesian modeling evidence = − log ≔ —˜ (| = −• + Generalization error ≈ likelihood | ”(t|i)∗”(i|z) | ”(||z) ”(t|i) prior distribution posterior distribution data self-information
  66. ≔ = − log = log () () − log

    () = log () () ∗ 1 () ”(z) = = log () () − log () = —˜ (| − … p ∗ log (p ) p —˜ (| = − •(z) evidence Information Criterion in Bayesian modeling
  67. likelihood prior posterior ≅ ) ∗ () likelihood | ”(t|i)∗”(i|z)

    | ”(||z) ”(t|i) prior distribution posterior distribution predictive distribution data () (|) Truth Information Criterion in Bayesian modeling evidence —˜ ( | = −•(t) + —˜ ( | = − •(z) Free energy Generalization error ≈ ≈ = − log ≔ self-information
  68. Summary

  69. Dice with α faces (regular polyhedron) ℎ … Truth Knowledge

    ? Hypothesis Observation = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4}
  70. ∶ → = 1 , … , ∞ , ∀

    ∈ ℕ = 1 , … , realization sample space (can NEVER get) = = ∀ = || probability distribution = <- g() X <- density() ~ x → t → ⇔ ~(|) statistical modeling outcome function of face dice g <- function( = 6) { map(1:∞, ~sample(1: , n=10, replace = TRUE)) }
  71. ~ (|) : → ∈ 4, 6, 8, 12, 20

    (|) = 6 = 12 = 20 t ← = 1 , … , ∞ x t ← , ∀ ∈ ℕ, ∀ ≤ ,
  72. ~ (|) : → (|) log ( ) = 6

    = 12 = 20 (|) = 8 likelihood = 6 ∈ 4, 6, 8, 12, 20 t ← = 1 , … , ∞ x t ← , ∀ ∈ ℕ, ∀ ≤ ,
  73. ~ (|) : → log ( ) = 6 =

    12 = 20 (|) = 8 likelihood ~ (|) ~ (|) : → (|) = =
  74. ~ (|) : → log ( ) = 6 =

    12 = 20 (|) = 8 likelihood ~ (|) ~ (|) : → (|) = ) ∗ | (|) ≅ ) ∗ |′ ∑ { ∗ (|′) ∀i } = ) ∗ () () Bayes' theorem likelihood prior posterior
  75. log ( ) = 6 = 12 = 20 (|)

    = 8 likelihood = 6 ≅ 94.58% = 6 ℎ ∀ |′ = 1/5 ~ (|) : → ~ (|) ~ (|) : → (|) = ) ∗ | (|) ≅ ) ∗ |′ ∑ { ∗ (|′) ∀i } likelihood prior posterior
  76. ~ (|) : → ~ (|) ~ (|) : →

    (|) = ) ∗ | (|) ≅ ) ∗ |′ ∑ { ∗ (|′) ∀i } likelihood prior posterior log ( ) = 6 = 12 = 20 (|) = 8 likelihood = 6 ≅ 94.58% = 6 ℎ ∀ |′ = 1/5
  77. likelihood prior posterior ≅ ) ∗ () likelihood | ”(t|i)∗”(i|z)

    | ”(||z) ”(t|i) prior distribution posterior distribution predictive distribution data () (|) Truth Information Criterion in Bayesian modeling ebidence —˜ ( | = −•(t) + —˜ ( | = − •(z) Free energy Generalization error ≈ ≈ = − log ≔ self-information
  78. Oh, by the way…

  79. or A B HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. A B There is a difference between A and B A>B A is better θ H1 :
  80. or x y HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. x y There is a difference between x and y A>B A is better θ H1 : = t − § § ← | ”(t|i) ©ª t ← | ”(t|i) ©¬ ←
  81. or x y HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. x y There is a difference between x and y A>B A is better θ H1 : = t − § t - ← | ← | ”(|│z) ”(t│i) § - ← | ← | ”(°│±) ”(§│²)
  82. or x y HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. x y There is a difference between x and y A>B A is better θ H1 : ³ ← [t , § ] t - ← | ← | ”(|│z) ”(t│i) § - ← | ← | ”(°│±) ”(§│²)
  83. or x y HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. x y There is a difference between x and y A>B A is better θ H1 : ³ ← [t , § ] t - ← | ← | ” ” § - ← | ← | ” ”
  84. ×

  85. Summary, again…

  86. What is modeling? ℎ f X ℎℎ Truth Knowledge Narrow

    sense Broad sense
  87. What is modeling? f X ℎℎ . f X ℎℎ

    . ℎ ℎ Hypothesis Driven Data Driven
  88. ∶ → = 1 , … , ∞ , ∀

    ∈ ℕ = 1 , … , p realization sample space (can NEVER get) = = ∀ = || probability distribution = <- g() X <- density() ~ x → t → ⇔ ~(|) statistical modeling outcome function with parameter g <- function( = 6) { map(1:∞, ~sample(1: , n=10, replace = TRUE)) }
  89. | ← = {1 , … , ∞ } x

    | ← t ← = 1 , … , ∞ x t ← ~(|) ~ (|) (, ) Bayesian Modeling
  90. v.s. me “MUST be wholy REJECTED!!!” “p-value **cking!!!” Frequentist Bayesian

    Old Stereotype
  91. f X ℎℎ . f X ℎℎ . ℎ ℎ

    Hypothesis Driven Data Driven ∶ → ∶ → →
  92. “Life shrinks or expands to one’s courage.” -- Anaïs Nin,

    2000 http://theamericanreader.com
  93. Before After BeginneR Session BeginneR BeginneR ?

  94. Enjoy!! KMT©

  95. Bar DraDra KMT©