Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BeginnerSession1_70th_TokyoR

kilometer
June 09, 2018
1.7k

 BeginnerSession1_70th_TokyoR

ベイズ統計に関するチュートリアル資料です。

kilometer

June 09, 2018
Tweet

Transcript

  1. BeginneR Advanced Hoxo_m If I have seen further it is

    by standing on the sholders of Giants. -- Sir Isaac Newton, 1676
  2. What is modeling? f X ℎℎ . f X ℎℎ

    . ℎ ℎ Hypothesis Driven Data Driven
  3. What is modeling? A/B test Hypothesis Driven やったこと ないけどね! or

    A B HA : A is better HB : B is better H0 : We have to choice better 1 of 2 Strong hypothesis A B * Simple data
  4. What is modeling? Meta Analysis H0 : There are best/better

    way Weaken hypothesis Complex data みんなこれの事を なんて呼ぶの? Data Driven
  5. What is modeling? Data Driven Analysis Hypothesis Driven Analysis How

    to do? What to do? Decision Making Weaken Hypothesis Strong Hypothesis Simple Data Complex Data
  6. What is modeling? Data Driven Hypothesis Driven How to do?

    What to do? Decision Making Weaken Hypothesis Strong Hypothesis Simple Data Complex Data Simple Model Complex Model
  7. What is modeling? Data Driven Hypothesis Driven How to do?

    What to do? Decision Making Weaken Hypothesis Strong Hypothesis Simple Data Complex Data Simple Model Complex Model Narrow sense Broad sense
  8. or A B HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2 A B * A is better
  9. There is only one difference between a madman and me.

    The madman thinks he is sane. I know I am mad. Dalí is a dilly. 1956 , The American Magazine, 162(1), 28–9, 107–9. -- Salvador Dalí
  10. or A B HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. A B There is a difference between A and B A>B A is better d H1 :
  11. ( = 5| = 4) = 0 Dice with faces

    = 5 ( = 5| = 6) = 1 6 ( = 5| = 8) = 1 8 ( = 5| = 12) = 1 12 ( = 5| = 20) = 1 20 likelihood maximum likelihood
  12. likelihood maximum likelihood you = {5, 4, 3, 4, 2,

    1, 2, 3, 1, 4} ( = | = 4) = 0 ( = | = 8) = 1 810 ( = | = 12) = 1 1210 ( = | = 20) = 1 2010 ( = | = 6) = 1 610 Dice with faces
  13. you Could you find α? Yes. α is estimated at

    6!! Why do you think so? Hmmmm…, well.., how many ( = 6)? Oh, it is d edf !! ….nnNNNO!!! WHAT!!???? friend Because, arg maxi {(|)} = 6!!
  14. Dice with faces ( = | = 6) = 1

    610 maximum likelihood you(before) you(after) ( = 6|)!!?? Hmmmm… Well.., how many ( = 6)? friend = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4}
  15. = 1 , … , ∞ , ∀ ∈ ℕ

    = 1 , … , p realization x <- sample(, 1) ∶= ∀ = || sample space (can NEVER get) stochastic variable probability distribution <- c(1, 1, 1, 1, 1, 2, 2, 3, 4, 5, 5) = hist(, freq = FALSE, label = TRUE) = 2 ~ ⇔ t → : number of trial
  16. ∶ → = 1 , … , ∞ , ∀

    ∈ ℕ = 1 , … , p realization sample space (can NEVER get) = = ∀ = || probability distribution g <- function( = 6) { map(1:∞, ~sample(1: , n=10, replace = TRUE)) } = <- g() X <- density() ~ x → t → ⇔ ~(|) statistical modeling outcome function of face dice
  17. probability distribution sample space | = ~ (|) ∶ →

    parameter = 1 , … , p ∈ | realization X <- map(1:∞, ~g()) x <- sample(X, 1) = 1 , … , ∞ , ∀ ∈ ℕ statistical modeling
  18. ~z (|) : → ~ (|) : → ∈ ∈

    | ← = {1 , … , ∞ } x | ← , ∈ 4, 6, 8, 12, 20 t ← = 1 , … , ∞ x t ← , ∀ ∈ ℕ, ∀ ≤ , (|) (|) statistical modeling statistical modeling
  19. ∀ ≤ | ← = {1 , … , ∞

    } x | ← , ∈ {4, 6, 8, 12, 20} t ← = 1 , … , ∞ x t ← , ∀ ∈ ℕ, ∀ ≤ , ~(|) ~ (|)
  20. ∗ ∗ () = = ) ∗ () () ℎ

    () ≠ 0, Bayes’ theorem ∩ = ( ∩ )
  21. = ) ∗ () () ℎ () ≠ 0, ~

    (|) = ) ∗ () ~ (|) : → : →
  22. likelihood = ) ∗ () () ℎ () ≠ 0,

    = ) ∗ () ~ (|) ~ (|) : → : →
  23. = = likelihood = ) ∗ () ~ (|) ~

    (|) : → : → | ← = 1 , … , ∞ t ← = 1 , … , ∞ , ∈ 4, 6, 8, 12, 20
  24. likelihood = …{ ∗ (|) ∀i } marginalization ∈ 4,

    6, 8, 12, 20 likelihood = ) ∗ () ~ (|) ~ (|) : → : → = =
  25. likelihood = ) ∗ | ∑ { ∗ (|) ∀i

    } ~ (|) ~ (|) : → : → = ) ∗ ()
  26. likelihood maximum likelihood you = {5, 4, 3, 4, 2,

    1, 2, 3, 1, 4} ( = | = 4) = 0 ( = | = 8) = 1 810 ( = | = 12) = 1 1210 ( = | = 20) = 1 2010 ( = | = 6) = 1 610 Dice with faces
  27. likelihood = ) ∗ | ∑ { ∗ (|) ∀i

    } ~ (|) ~ (|) : → : → (|) = 1 , … , ∞ , ∀ ∈ ℕ sample space (can NEVER get)
  28. likelihood = ) ∗ | ∑ { ∗ (|) ∀i

    } (|) = 1 , … , ∞ , ∀ ∈ ℕ sample space CAN NEVER GET . ~ (|) ~ (|) : → : →
  29. ∀ | ≅ ∀ |‰ = 1 5 ∈ 4,

    6, 8, 12, 20 (|) likelihood = ) ∗ | ∑ { ∗ (|) ∀i } ~ (|) ~ (|) : → : →
  30. = ) ∑ { ∀i } ≈ ) 1.7485 −

    08 = (|) 4 + 6 + 8 + 12 + 20 likelihood ≅ ) ∗ |′ ∑ { ∗ (|′) ∀i } , ℎ ∀ |′ = 1/5 ~ (|) ~ (|) : → : →
  31. ( = | = 6) = 1 610 maximum likelihood

    you ≅ (| = 6) 1.7485 − 08 Hmmmm… Well.., how many ( = 6)? friend ≈ 94.85% = 6 = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} Dice with faces
  32. 6 ≈ 94.58% 4 = 0% 8 ≈ 5.32% 12

    ≈ 0.09% 20 ≈ 0.0005% 4 X‰ = 1 5 6 X‰ = 1 5 8 X‰ = 1 5 12 X‰ = 1 5 20 X‰ = 1 5 prior probability posterior probability MAP(Maximum a posteriori) estimation arg i {(|)}= 6
  33. = {5, 4, 3, 4, 2, 1, 2, 3, 1,

    4} Dice with faces 11 ≤ 6|6 ∗ 6 ≈ 94.58% 11 ≤ 6|4 ∗ 4 = 0% 11 ≤ 6|8 ∗ 8 ≈ 3.99% 11 ≤ 6|12 ∗ 12 ≈ 0.046% 11 ≤ 6|20 ∗ 20 ≈ 0.0001% 11 ≤ 6 ≈ 98.62% predictive probability
  34. you OK, let’s try 11!! friend (11 ≤ 6|) ≈

    98.62% And, = 6 ≈ 94.58% = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} Dice with faces
  35. you OK, let’s try 11!! friend (11 ≤ 6|) ≅

    98.88% And, = 6 ≅ 94.85% 11 = 8 = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} Dice with faces
  36. you OK, let’s try 11!! friend (11 ≤ 6|) ≈

    98.62% And, = 6 ≈ 94.58% 11 = 8 = 6 {, 11 } = 0% = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} Dice with faces
  37. = {5, 4, 3, 4, 2, 1, 2, 3, 1,

    4} Dice with faces ́ = {, 8} likelihood ≅ ) ∗ |′ () posterior prior Dice with faces likelihood ́ ≅ ́ ) ∗ |′′ (́) prior
  38. = {5, 4, 3, 4, 2, 1, 2, 3, 1,

    4} Dice with faces ́ = {, 8} likelihood ≅ ) ∗ |′ () posterior prior Dice with faces likelihood ́ ≅ ́ ) ∗ | (́) prior posterior
  39. X‰ = 20%, 20%, 20%, 20%, 20% prior posterior =

    {4, 6, 8, 12, 20} ≈ 0%, 94.58%, 5.32%, 0.09%, 0.0005% posterior ́ ≈ 0%, 0%, 99.98%, 0.020%, 0.0000004% prior = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4} ́ = {, 8}
  40. you OK!!! Let’s 12 !! COME OOON friend (12 ≤

    8|́) ≈ 99.98% And, = 8 ́ ≈ 99.98% Dice with faces ́ = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4, 8}
  41. likelihood posterior ≅ ) ∗ () likelihood | ”(t|i)∗”(i|z) |

    ”(||z) ”(t|i) prior distribution posterior distribution predictive distribution data prior
  42. likelihood posterior ≅ ) ∗ () predictive distribution () (|)

    Truth Information Criterion in Bayesian modeling prior likelihood | ”(t|i)∗”(i|z) | ”(||z) ”(t|i) prior distribution posterior distribution data
  43. likelihood prior posterior ≅ ) ∗ () predictive distribution ()

    (|) —˜ (| = − () Kullback–Leibler divergence Information Criterion in Bayesian modeling Truth = − log − − log = log likelihood | ”(t|i)∗”(i|z) | ”(||z) ”(t|i) prior distribution posterior distribution data expectation self-information
  44. = › ∗ log () (|) = › ∗ log

    () − › ∗ log (|) = −•(t) − › ∗ log (|) Generalization error ≔ min ” —˜ (| ⇔ min ” Entropy WAIC Information Criterion in Bayesian modeling Kullback–Leibler divergence —˜ (| = log = −[()] − › ∗ log (|)
  45. likelihood prior posterior ≅ ) ∗ () predictive distribution ()

    (|) Truth Information Criterion in Bayesian modeling —˜ (| = −• + Generalization error ≈ likelihood | ”(t|i)∗”(i|z) | ”(||z) ”(t|i) prior distribution posterior distribution data
  46. likelihood prior posterior ≅ ) ∗ () likelihood posterior distribution

    predictive distribution () (|) Truth Information Criterion in Bayesian modeling evidence = − log ≔ —˜ (| = −• + Generalization error ≈ likelihood | ”(t|i)∗”(i|z) | ”(||z) ”(t|i) prior distribution posterior distribution data self-information
  47. ≔ = − log = log () () − log

    () = log () () ∗ 1 () ”(z) = = log () () − log () = —˜ (| − … p ∗ log (p ) p —˜ (| = − •(z) evidence Information Criterion in Bayesian modeling
  48. likelihood prior posterior ≅ ) ∗ () likelihood | ”(t|i)∗”(i|z)

    | ”(||z) ”(t|i) prior distribution posterior distribution predictive distribution data () (|) Truth Information Criterion in Bayesian modeling evidence —˜ ( | = −•(t) + —˜ ( | = − •(z) Free energy Generalization error ≈ ≈ = − log ≔ self-information
  49. Dice with α faces (regular polyhedron) ℎ … Truth Knowledge

    ? Hypothesis Observation = {5, 4, 3, 4, 2, 1, 2, 3, 1, 4}
  50. ∶ → = 1 , … , ∞ , ∀

    ∈ ℕ = 1 , … , realization sample space (can NEVER get) = = ∀ = || probability distribution = <- g() X <- density() ~ x → t → ⇔ ~(|) statistical modeling outcome function of face dice g <- function( = 6) { map(1:∞, ~sample(1: , n=10, replace = TRUE)) }
  51. ~ (|) : → ∈ 4, 6, 8, 12, 20

    (|) = 6 = 12 = 20 t ← = 1 , … , ∞ x t ← , ∀ ∈ ℕ, ∀ ≤ ,
  52. ~ (|) : → (|) log ( ) = 6

    = 12 = 20 (|) = 8 likelihood = 6 ∈ 4, 6, 8, 12, 20 t ← = 1 , … , ∞ x t ← , ∀ ∈ ℕ, ∀ ≤ ,
  53. ~ (|) : → log ( ) = 6 =

    12 = 20 (|) = 8 likelihood ~ (|) ~ (|) : → (|) = =
  54. ~ (|) : → log ( ) = 6 =

    12 = 20 (|) = 8 likelihood ~ (|) ~ (|) : → (|) = ) ∗ | (|) ≅ ) ∗ |′ ∑ { ∗ (|′) ∀i } = ) ∗ () () Bayes' theorem likelihood prior posterior
  55. log ( ) = 6 = 12 = 20 (|)

    = 8 likelihood = 6 ≅ 94.58% = 6 ℎ ∀ |′ = 1/5 ~ (|) : → ~ (|) ~ (|) : → (|) = ) ∗ | (|) ≅ ) ∗ |′ ∑ { ∗ (|′) ∀i } likelihood prior posterior
  56. ~ (|) : → ~ (|) ~ (|) : →

    (|) = ) ∗ | (|) ≅ ) ∗ |′ ∑ { ∗ (|′) ∀i } likelihood prior posterior log ( ) = 6 = 12 = 20 (|) = 8 likelihood = 6 ≅ 94.58% = 6 ℎ ∀ |′ = 1/5
  57. likelihood prior posterior ≅ ) ∗ () likelihood | ”(t|i)∗”(i|z)

    | ”(||z) ”(t|i) prior distribution posterior distribution predictive distribution data () (|) Truth Information Criterion in Bayesian modeling ebidence —˜ ( | = −•(t) + —˜ ( | = − •(z) Free energy Generalization error ≈ ≈ = − log ≔ self-information
  58. or A B HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. A B There is a difference between A and B A>B A is better θ H1 :
  59. or x y HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. x y There is a difference between x and y A>B A is better θ H1 : = t − § § ← | ”(t|i) ©ª t ← | ”(t|i) ©¬ ←
  60. or x y HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. x y There is a difference between x and y A>B A is better θ H1 : = t − § t - ← | ← | ”(|│z) ”(t│i) § - ← | ← | ”(°│±) ”(§│²)
  61. or x y HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. x y There is a difference between x and y A>B A is better θ H1 : ³ ← [t , § ] t - ← | ← | ”(|│z) ”(t│i) § - ← | ← | ”(°│±) ”(§│²)
  62. or x y HA : A is better HB :

    B is better H0 : We have to choice better 1 of 2. x y There is a difference between x and y A>B A is better θ H1 : ³ ← [t , § ] t - ← | ← | ” ” § - ← | ← | ” ”
  63. ×

  64. What is modeling? f X ℎℎ . f X ℎℎ

    . ℎ ℎ Hypothesis Driven Data Driven
  65. ∶ → = 1 , … , ∞ , ∀

    ∈ ℕ = 1 , … , p realization sample space (can NEVER get) = = ∀ = || probability distribution = <- g() X <- density() ~ x → t → ⇔ ~(|) statistical modeling outcome function with parameter g <- function( = 6) { map(1:∞, ~sample(1: , n=10, replace = TRUE)) }
  66. | ← = {1 , … , ∞ } x

    | ← t ← = 1 , … , ∞ x t ← ~(|) ~ (|) (, ) Bayesian Modeling
  67. f X ℎℎ . f X ℎℎ . ℎ ℎ

    Hypothesis Driven Data Driven ∶ → ∶ → →