Reading Gibbs Sampling by Gelfand & Smith

Dfbaebe5e96e827d993483f842c74fa2?s=47 Xi'an
November 04, 2013

Reading Gibbs Sampling by Gelfand & Smith

slides of Guillaume Reveillon presenting the historical paper on Gibbs sampling

Dfbaebe5e96e827d993483f842c74fa2?s=128

Xi'an

November 04, 2013
Tweet

Transcript

  1. Sampling-Based Approaches to Calculating Marginal Densities Alan E. Gelfand; Adrian

    F. Smith Journal of the American Statistical Association, Vol. 85, No. 410. (Jun., 1990), pp. 398-409 November 4, 2013
  2. Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

    Illustrations IV. Conclusion
  3. Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

    Illustrations IV. Conclusion
  4. I. Introduction 1. Purposes 2. Context

  5. I. Introduction 1. Purposes 2. Context

  6. I. 1. Purposes I Exploitation of structural information to obtain

    numerical estimates of non analytically available marginal densities I Using of sampling methods instead of implementing sophisticated numerical analytic ones I More attractable because of their simplicity and ease of implementation
  7. I. 1. Purposes I Exploitation of structural information to obtain

    numerical estimates of non analytically available marginal densities I Using of sampling methods instead of implementing sophisticated numerical analytic ones I More attractable because of their simplicity and ease of implementation
  8. I. 1. Purposes I Exploitation of structural information to obtain

    numerical estimates of non analytically available marginal densities I Using of sampling methods instead of implementing sophisticated numerical analytic ones I More attractable because of their simplicity and ease of implementation
  9. I. Introduction 1. Purposes 2. Context

  10. I. 2. Context Two cases I A) For i =

    1, . . . , k the conditional distributions U i | U j ( j 6= i ) are available. I We can also consider the reduced forms Ui | Uj where j 2 Si ⇢ { 1 , . . . , k } I B) The functional form of the joint density of U1 , U2 , . . . , Uk is known and at least one U i | U j ( j 6= i ) is available
  11. I. 2. Context Two cases I A) For i =

    1, . . . , k the conditional distributions U i | U j ( j 6= i ) are available. I We can also consider the reduced forms Ui | Uj where j 2 Si ⇢ { 1 , . . . , k } I B) The functional form of the joint density of U1 , U2 , . . . , Uk is known and at least one U i | U j ( j 6= i ) is available
  12. I. 2. Context Two cases I A) For i =

    1, . . . , k the conditional distributions U i | U j ( j 6= i ) are available. I We can also consider the reduced forms Ui | Uj where j 2 Si ⇢ { 1 , . . . , k } I B) The functional form of the joint density of U1 , U2 , . . . , Uk is known and at least one U i | U j ( j 6= i ) is available
  13. Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

    Illustrations IV. Conclusion
  14. II. Sampling Approaches Assumptions I Dealing with real random variables

    having a joint distribution whose density function is strictly positive over the sample space I Full set of conditional specifications uniquely defines the full joint density I Existence of densities with respect to either Lebesgue or counting measures for all marginal and conditional distributions
  15. II. Sampling Approaches Assumptions I Dealing with real random variables

    having a joint distribution whose density function is strictly positive over the sample space I Full set of conditional specifications uniquely defines the full joint density I Existence of densities with respect to either Lebesgue or counting measures for all marginal and conditional distributions
  16. II. Sampling Approaches Assumptions I Dealing with real random variables

    having a joint distribution whose density function is strictly positive over the sample space I Full set of conditional specifications uniquely defines the full joint density I Existence of densities with respect to either Lebesgue or counting measures for all marginal and conditional distributions
  17. II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

    Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
  18. II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

    Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
  19. II. 1. Substitution Algorithm Introduction I Standard mathematical tool used

    in finding fixed-pointed solutions to certain classes of integral equations I Its utility in statistical problems was developed by Tanner and Wong (1987) who called it a data-augmentation algorithm
  20. II. 1. Substitution Algorithm Introduction I Standard mathematical tool used

    in finding fixed-pointed solutions to certain classes of integral equations I Its utility in statistical problems was developed by Tanner and Wong (1987) who called it a data-augmentation algorithm
  21. II. 1. Substitution Algorithm Two-variable case / Tanner and Wong

    ‘ development (1) [ X ] = ´ [ X | Y ] ⇤ [ Y ] (2) [ Y ] = ´ [ Y | X ] ⇤ [ X ] (3) [ X ] = ´ [ X | Y ] ⇤ ´ [ Y | X 0] ⇤ [ X 0] = ´ h ( X , X 0) ⇤ [ X 0] Where h ( X , X 0) = ´ [ X | Y ] ⇤ [ Y | X 0] With X 0 a dummy argument and [ X 0] s [ X ]
  22. II. 1. Substitution Algorithm Two-variable case / Tanner and Wong

    ‘ development (1) [ X ] = ´ [ X | Y ] ⇤ [ Y ] (2) [ Y ] = ´ [ Y | X ] ⇤ [ X ] (3) [ X ] = ´ [ X | Y ] ⇤ ´ [ Y | X 0] ⇤ [ X 0] = ´ h ( X , X 0) ⇤ [ X 0] Where h ( X , X 0) = ´ [ X | Y ] ⇤ [ Y | X 0] With X 0 a dummy argument and [ X 0] s [ X ]
  23. II. 1. Substitution Algorithm Two-variable case / Algorithm I [

    X 0] replaced by [ X ]i I [ X ]i+ 1 = ´ h ( X , X 0) ⇤ [ X ]i = Ih[ X ]h I Ih = integral operator associated with h
  24. II. 1. Substitution Algorithm Two-variable case / Algorithm I [

    X 0] replaced by [ X ]i I [ X ]i+ 1 = ´ h ( X , X 0) ⇤ [ X ]i = Ih[ X ]h I Ih = integral operator associated with h
  25. II. 1. Substitution Algorithm Main properties Theorem TW1 (uniqueness). The

    true marginal density, [ X ] , is the unique solution to (3)
  26. II. 1. Substitution Algorithm Main properties Theorem TW2 (convergence). For

    almost any [ X ] 0, the sequence [ X ] 1 , [ X ] 2 , . . . defined by [ X ]i+ 1 = Ih[ X ]i converges monotonically in L1 to [ X ]
  27. II. 1. Substitution Algorithm Main properties Theorem TW3 (rate). ´

    | [ X ]i [ X ] |! 0 geometrically in i
  28. II. 1. Substitution Algorithm Extension case X , Y and

    Z , three random variables : (4) [ X ] = ´ [ X , Z | Y ] ⇤ [ Y ] (5) [ Y ] = ´ [ Y , X | Z ] ⇤ [ Z ] (6) [ Z ] = ´ [ Z , Y | X ] ⇤ [ X ] By substitution : I [ X ] = ´ h ( X , X 0) ⇤ [ X 0] I Where h ( X , X 0) = ´ [ X , Z | Y ] ⇤ [ Y , X | Z ] ⇤ [ Z , Y | X 0] I With X 0 a dummy argument and [ X 0] s [ X ] Extension to k variables is straightforward.
  29. II. 1. Substitution Algorithm Extension case X , Y and

    Z , three random variables : (4) [ X ] = ´ [ X , Z | Y ] ⇤ [ Y ] (5) [ Y ] = ´ [ Y , X | Z ] ⇤ [ Z ] (6) [ Z ] = ´ [ Z , Y | X ] ⇤ [ X ] By substitution : I [ X ] = ´ h ( X , X 0) ⇤ [ X 0] I Where h ( X , X 0) = ´ [ X , Z | Y ] ⇤ [ Y , X | Z ] ⇤ [ Z , Y | X 0] I With X 0 a dummy argument and [ X 0] s [ X ] Extension to k variables is straightforward.
  30. II. 1. Substitution Algorithm Extension case X , Y and

    Z , three random variables : (4) [ X ] = ´ [ X , Z | Y ] ⇤ [ Y ] (5) [ Y ] = ´ [ Y , X | Z ] ⇤ [ Z ] (6) [ Z ] = ´ [ Z , Y | X ] ⇤ [ X ] By substitution : I [ X ] = ´ h ( X , X 0) ⇤ [ X 0] I Where h ( X , X 0) = ´ [ X , Z | Y ] ⇤ [ Y , X | Z ] ⇤ [ Z , Y | X 0] I With X 0 a dummy argument and [ X 0] s [ X ] Extension to k variables is straightforward.
  31. II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

    Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
  32. II. 2. Substitution Sampling Two-variable case / Assumptions I [

    X | Y ] and [ Y | X ] are available I [ X ] 0 is an arbitrary (possibly degenerate) initial density
  33. II. 2. Substitution Sampling Two-variable case / Assumptions I [

    X | Y ] and [ Y | X ] are available I [ X ] 0 is an arbitrary (possibly degenerate) initial density
  34. II. 2. Substitution Sampling Two-variable case / Algorithm Cycle (i)

    Draw a single X ( 0 ) from [ X ] 0 (ii) Draw Y ( 1 ) ⇠ [ Y | X ( 0 )] (iii) Complete this cycle by drawing X ( 1) ⇠ [ X | Y ( 1 )]
  35. II. 2. Substitution Sampling Two-variable case / Convergence Repetition of

    this cycle produces, after i iterations, the pair ( X (i), Y (i)) such that : I X (i) d ! X ⇠ [ X ] and Y (i) d ! Y ⇠ [ Y ]
  36. II. 2. Substitution Sampling Two-variable case / Convergence Repetition of

    this cycle produces, after i iterations, the pair ( X (i), Y (i)) such that : I X (i) d ! X ⇠ [ X ] and Y (i) d ! Y ⇠ [ Y ]
  37. II. 2. Substitution Sampling Two-variable case / Estimator If at

    each iteration i we generate m iid pairs ( X (i) j , Y (i) j ) ( j 2 {1, . . . , m } , i 2 {1, . . . , k }), we can estimate [ X ] with the Monte Carlo integration : h ˆ X i i = 1 m m X j= 1 h X | Y (i) j i
  38. II. 2. Substitution Sampling Two-variable case / Estimator If at

    each iteration i we generate m iid pairs ( X (i) j , Y (i) j ) ( j 2 {1, . . . , m } , i 2 {1, . . . , k }), we can estimate [ X ] with the Monte Carlo integration : h ˆ X i i = 1 m m X j= 1 h X | Y (i) j i
  39. II. 2. Substitution Sampling Two-variable case / L1 Convergence L1

    convergence of h ˆ X i i to [ X ] since : ˆ | h ˆ X i i [ X ] | ˆ | h ˆ X i i [ X ]i | + ˆ | [ X ]i [ X ] | and I h ˆ X i i P ! [ X ]i when m ! 1 (Glick 1974) I [ X ]i L1 ! [ X ] when i ! 1 (TW2)
  40. II. 2. Substitution Sampling Extension case I Extension to more

    than two variables is straightforward. I In the three-variable-case with an arbitrary starting marginal density [ X ] 0 for X : I X (0) ⇠ [ X ]0 I ( Z (0)0 , Y (0)0 ) ⇠ ⇥ Z , Y | X (0) ⇤ I ( Y (1), X (0)0 ) ⇠ h Y , X | Z (0)0 i I ( X (1), Z (1)) ⇠ ⇥ X , Z | Y (1) ⇤ I Six generated variables required
  41. II. 2. Substitution Sampling Extension case I Repeating this cycle

    i times produces ( X (i), Y (i), Z (i)) such that : I X (i) d ! X ⇠ [ X ] , Y (i) d ! Y ⇠ [ Y ] and Z (i) d ! Z ⇠ [ Z ] I At the i th iteration for m generations : I ( X (i) j , Y (i) j , Z (i) j ) iid samples I h ˆ X i i = 1 m m X j=1 h X | Y (i) j , Z (i) j i I The L1 convergence still follows
  42. II. 2. Substitution Sampling Extension case For k variables ,

    U1 , . . . , Uk, : I k ( k 1) random variate generations to complete one cycle I mik ( k 1) random generations for m sequences an i iterations I h ˆ U s i i = 1 m m X j= 1 h U s | U t = U (i) tj ; t 6= s i I The L1 convergence still follows
  43. II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

    Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
  44. II. 3. Gibbs Sampling Introduction I Introduced by Geman and

    Geman (1984) to simulate marginal densities using full conditional distributions I [ X | Y , Z ] , [ Y | X , Z ] and [ Z | X , Y ] in the three-variable-case
  45. II. 3. Gibbs Sampling Introduction I Gibbs sampler was introduced

    by Geman and Geman (1984) to simulate marginal densities without using all conditional distributions (just the full ones) I [ X | Y , Z ] , [ Y | X , Z ] and [ Z | X , Y ] in the three-variable-case
  46. II. 3. Gibbs Sampling Gibbs scheme I A Markovian updating

    scheme I With an arbitrary starting set of values U ( 0 ) 1 , . . . , U ( 0 ) k : I U (1) 1 ⇠ h U1 | U (0) 2 , . . . , U (0) k i I U (1) 2 ⇠ h U2 | U (1) 1 , U (0) 3 , . . . , U (0) k i I U (1) 3 ⇠ h U3 | U (1) 1 , U (1) 2 , U (0) 4 , . . . , U (0) k i I . . . I U (1) k ⇠ h Uk | U (1) 1 , . . . , U (1) k 1 i I K random variate generations required in a cycle I Afer i iterations =) ( U (i) 1 , . . . , U (i) k )
  47. II. 3. Gibbs Sampling Gibbs scheme I A Markovian updating

    scheme I With an arbitrary starting set of values U ( 0 ) 1 , . . . , U ( 0 ) k : I U (1) 1 ⇠ h U1 | U (0) 2 , . . . , U (0) k i I U (1) 2 ⇠ h U2 | U (1) 1 , U (0) 3 , . . . , U (0) k i I U (1) 3 ⇠ h U3 | U (1) 1 , U (1) 2 , U (0) 4 , . . . , U (0) k i I . . . I U (1) k ⇠ h Uk | U (1) 1 , . . . , U (1) k 1 i I K random variate generations required in a cycle I Afer i iterations =) ( U (i) 1 , . . . , U (i) k )
  48. II. 3. Gibbs Sampling Gibbs scheme I A Markovian updating

    scheme I With an arbitrary starting set of values U ( 0 ) 1 , . . . , U ( 0 ) k : I U (1) 1 ⇠ h U1 | U (0) 2 , . . . , U (0) k i I U (1) 2 ⇠ h U2 | U (1) 1 , U (0) 3 , . . . , U (0) k i I U (1) 3 ⇠ h U3 | U (1) 1 , U (1) 2 , U (0) 4 , . . . , U (0) k i I . . . I U (1) k ⇠ h Uk | U (1) 1 , . . . , U (1) k 1 i I K random variate generations required in a cycle I Afer i iterations =) ( U (i) 1 , . . . , U (i) k )
  49. II. 3. Gibbs Sampling Main properties Theorem GG1 (convergence) (

    U (i) 1 , . . . , U (i) k ) d ! [ U1 , . . . , Uk] and hence for each s, U (i) s d ! U s ⇠ [ U s] as i ! 1
  50. II. 3. Gibbs Sampling Main properties Theorem GG2 (rate) Using

    the sup norm, rather than the L1 norm, the joint density of ( U (i) 1 , . . . , U (i) k ) converges to the true joint density at a geometric rate in i.
  51. II. 3. Gibbs Sampling Main properties Theorem GG3 (ergodic theorem)

    For any measurable function T of U1 , . . . , Uk whose expectation exists, lim i!1 1 i i X l= 1 T ⇣ U (l) 1 , . . . , U (l) k ⌘ a.s. ! E ( T (( U1 , . . . , Uk))
  52. II. 3. Gibbs Sampling Estimator The density estimate for [

    U s] is given by : h ˆ U s i i = 1 m m X j= 1 h U s | U t = U (i) tj ; t 6= s i
  53. II. 3. Gibbs Sampling Substitution versus Gibbs I Differences beetwen

    these two samplers : Substitution Gibbs Conditional distributions all full ones Variables generated k(k-1) k
  54. II. 3. Gibbs Sampling Substitution versus Gibbs I Substitution and

    Gibbs sampler are equivalent when only the set of full conditionnals is available. I If reduced conditional distributions are available, substitution sampling offers the possibility of acceleration relative to Gibbs sampling.
  55. II. 3. Gibbs Sampling Substitution versus Gibbs Example a) Y

    ( 0 )0 ⇠ ⇥ Y | X ( 0 ), Z ( 0 ) ⇤ b) Z ( 0 )0 ⇠ h X | Y ( 0 )0 , X ( 0 ) i c) X ( 0 )0 ⇠ h Z | Z ( 0 )0 , Y ( 0 )0 i d) Y ( 1 ) ⇠ h Y | X ( 0 )0 , Z ( 0 )0 i e) Z ( 1 ) ⇠ h Z | Y ( 1 ), X ( 0 )0 i f) X ( 1 ) ⇠ ⇥ X | Z ( 1 ), Y ( 1 ) ⇤ If [ Z | Y ] is available, e) becomes Z ( 1 ) ⇠ ⇥ Z | Y ( 1 ) ⇤
  56. II. 3. Gibbs Sampling Substitution versus Gibbs Example a) Y

    ( 0 )0 ⇠ ⇥ Y | X ( 0 ), Z ( 0 ) ⇤ b) Z ( 0 )0 ⇠ h X | Y ( 0 )0 , X ( 0 ) i c) X ( 0 )0 ⇠ h Z | Z ( 0 )0 , Y ( 0 )0 i d) Y ( 1 ) ⇠ h Y | X ( 0 )0 , Z ( 0 )0 i e) Z ( 1 ) ⇠ h Z | Y ( 1 ), X ( 0 )0 i f) X ( 1 ) ⇠ ⇥ X | Z ( 1 ), Y ( 1 ) ⇤ If [ Z | Y ] is available, e) becomes Z ( 1 ) ⇠ ⇥ Z | Y ( 1 ) ⇤
  57. II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

    Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
  58. II. 4. Rubin Importance-Sampling Algorithm Introduction I A noniterative Monte

    Carlo method for generating marginal distributions using importance-sampling ideas
  59. II. 4. Rubin Importance-Sampling Algorithm Introduction Fact Monte Carlo integration

    Problem Jh = Ef [ h ( X )] = ˆ H h ( x ) f ( x ) dx MC solution ¯ h m = 1 m m X i= 1 h ( x i ) with ( x1 , . . . , x m) ⇠ f
  60. II. 4. Rubin Importance-Sampling Algorithm Introduction Fact Importance Idea Problem

    Jh = E g  h ( X )f ( X ) g ( X ) = ˆ H  h ( x )f ( x ) g ( x ) g ( x ) dx MC solution ¯ h m = 1 m m X i= 1 h ( x i )f ( x i ) g ( x i ) with ( x1 , . . . , x m) ⇠ g
  61. II. 4. Rubin Importance-Sampling Algorithm Two-variable case / Assumptions I

    The functional form (modulo the normalizing constant), of the joint density [ X , Y ] is known I the conditional distribution [ X | Y ] is available
  62. II. 4. Rubin Importance-Sampling Algorithm Two-variable case / Assumptions I

    The functional form (modulo the normalizing constant), of the joint density [ X , Y ] is known I The conditional distribution [ X | Y ] is available
  63. II. 4. Rubin Importance-Sampling Algorithm Two-variable case / Rubin’s idea

    I Choose an importance-sampling distribution [ Y ]s for Y I Use [ X | Y ] ⇤ [ Y ]s as an importance-sampling distribution for ( X , Y ) I ( Xl , Yl ) is created by drawing Yl ⇠ [ Y ]s and Xl ⇠ [ X | Yl ] ( l = 1 , . . . , N ) I Calculate rl = [ Xl , Yl ] [ Xl | Yl ] ⇤ [ Yl ]s
  64. II. 4. Rubin Importance-Sampling Algorithm Two-variable case I The estimator

    of the marginal density [ X ] is : h ˆ X i = PN l= 1 [ X | Yl ] ⇤ rl PN l= 1 rl
  65. II. 4. Rubin Importance-Sampling Algorithm Main property By dividing the

    numerator and the denominator by N and using the law of large number, we obtain : Theorem R1 (convergence) h ˆ X i ! [ X ] with probability 1 as N ! 1 for almost every X
  66. II. 4. Rubin Importance-Sampling Algorithm Extension case / Assumptions In

    the three-variable case, we need : I The functional form of [ X , Y , Z ] I The availability of [ X | Y , Z ] I An importance- sampling distribution [ Y , Z ]s
  67. II. 4. Rubin Importance-Sampling Algorithm Extension case / Estimator Then,

    we have : I rl = [ Xl , Yl , Zl ] [ Xl | Yl , Zl ] ⇤ [ Yl , Zl ]s I h ˆ X i = PN l= 1 [ X | Yl , Zl ] ⇤ rl PN l= 1 rl
  68. II. 4. Rubin Importance-Sampling Algorithm Extension case In the k-variable

    case : I Nk variables are generated I Extension is also straightforward
  69. Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

    Illustrations IV. Conclusion
  70. III. Examples and Numerical Illustrations 1. A Multinomial Model 2.

    A Conjugate Hierarchical Model
  71. III. Examples and Numerical Illustrations 1. A Multinomial Model 2.

    A Conjugate Hierarchical Model
  72. III. 1. A Multinomial Model Motivations I Two-parameter version of

    a one-parameter genetic-linkage example I Some observations are not assigned to individual cells but to aggregates of cells : =) we use multinomial sampling
  73. III. 1. A Multinomial Model Motivations I Two-parameter version of

    a one-parameter genetic-linkage example I Some observations are not assigned to individual cells but to aggregates of cells : =) we use multinomial sampling
  74. III. 1. A Multinomial Model Data and Prior information I

    Y = ( Y1 , . . . , Y5 ) ⇠ mult ( n , a1 ✓ + b1 , a2 ✓ + b2 , a3 ⌘ + b3 , a4 ⌘ + b4 , c (1 ✓ ⌘)) I where ai , bi 0 are known, I 0 < c = 1 P bi = a1 + a2 = a3 + a4 < 1 I ✓, ⌘ 0 , ✓ + ⌘  1 I (✓, ⌘) ⇠ Dirichlet (↵ 1 , ↵ 2 , ↵ 3 )
  75. III. 1. A Multinomial Model Data and Prior information I

    Y = ( Y1 , . . . , Y5 ) ⇠ mult ( n , a1 ✓ + b1 , a2 ✓ + b2 , a3 ⌘ + b3 , a4 ⌘ + b4 , c (1 ✓ ⌘)) I where ai , bi 0 are known, I 0 < c = 1 P bi = a1 + a2 = a3 + a4 < 1 I ✓, ⌘ 0 , ✓ + ⌘  1 I (✓, ⌘) ⇠ Dirichlet (↵ 1 , ↵ 2 , ↵ 3 )
  76. III. 1. A Multinomial Model Unobservable Data and Posterior Information

    I X = ( X1 , . . . , X9 ) ⇠ mult ( n , a1 ✓, b1 , a2 ✓, b2 , a3 ⌘, b3 , a4 ⌘, b4 , c (1 ✓ ⌘)) I (✓, ⌘ | X ) ⇠ Dirichlet ( X1 + X3 + ↵ 1 , X5 + X7 + ↵ 2 , X9 + ↵ 3 ) I [✓ | X , ⌘] and [⌘ | X , ✓] are available as scaled beta distributions on [0, 1 ⌘] and [0, 1 ✓]
  77. III. 1. A Multinomial Model Authors’ trick If we let

    : Y1 = X1 + X2 , Y2 = X3 + X4 , Y3 = X5 + X6 , Y4 = X7 + X8 and Y5 = X9 Z = ( X1 , X3 , X5 , X7 ) =) studying X is equivalent to studying ( Y , Z )
  78. III. 1. A Multinomial Model Authors’ trick If we let

    : Y1 = X1 + X2 , Y2 = X3 + X4 , Y3 = X5 + X6 , Y4 = X7 + X8 and Y5 = X9 Z = ( X1 , X3 , X5 , X7 ) =) studying X is equivalent to studying ( Y , Z )
  79. III. 1. A Multinomial Model Studied case I So, we

    have a three-variable case (✓, ⌘, Z ) with interest in the marginal distributions : I [✓ | Y ] , [⌘ | Y ] and [ Z | Y ] I We can remark that [ Z | Y , ✓, ⌘] is the product of four independent binomials X1 , X3 , X5 and X7 =)[ X i | Y , ✓, ⌘] = binomial ( Y i , ai ✓ (ai ✓+bi ) )
  80. III. 1. A Multinomial Model Data Used Y = (14,

    1, 1, 1, 5) ⇠ mult (22, 1 4 ✓ + 1 8 , 1 4 ⌘, 1 4 ⌘ + 3 8 , 1 2 (1 ✓ ⌘)) X = ( X1 , . . . , X7 ) ⇠ mult (22, 1 4 ✓, 1 8 , 1 4 ⌘, 1 4 ⌘, 3 8 , 1 2 (1 ✓ ⌘)) Z = ( X1 , X5 )
  81. III. 1. A Multinomial Model Substitution and Gibbs Sampling To

    compare the two forms of iterative sampling, the authors : I obtained a numerical estimates of [✓ | Y ] and [⌘ | Y ] I processed 5,000 times the following scheme : I initialize : ✓ ⇠ U ( 0 , 1 ), ⌘ ⇠ U ( 0 , 1 ), 0  ✓ + ⌘  1 I run 4 cycles of the two samplers with m = 10 I Compared the average cumulative posterior probabilities
  82. III. 1. A Multinomial Model Substitution and Gibbs Sampling

  83. III. 1. A Multinomial Model Substitution and Gibbs Sampling We

    note that : I Substitution sampler adapts more quickly than Gibbs sampler I By the time, the two samplers have the same performance I Few random variate generations are required to obtain convergence ( m=10)
  84. III. 1. A Multinomial Model Rubin Importance-Sampling The Rubin importance-sampling

    requires : I [ Z | Y ]s to draw Zl I [⌘ | Y , Z ] to draw ⌘l I [✓ | ⌘, Z , Y ] to draw ✓l The rl ratio is given by : rl = [ Y , Zl | ✓l , ⌘l ] ⇤ [✓l , ⌘l ] [✓ | ⌘, Z , Y ] ⇤ [⌘ | Y , Z ] ⇤ [ Z | Y ]
  85. III. 1. A Multinomial Model Rubin Importance-Sampling I The authors

    obtained the following average cumulative posterior probabilities with: I 2,500 simulations I [ Z | Y ]s following the product of X1 ⇠ binomial ( Y1, 1 2 ) and X5 ⇠ binomial ( Y4, 1 2 )
  86. III. 1. A Multinomial Model Rubin Importance-Sampling We note that

    : I The estimation is rather poor I The result may be sensitive to the choice of the importance distribution
  87. III. Examples and Numerical Illustrations 1. A Multinomial Model 2.

    A Poisson Model
  88. III. 2. A Poisson Model Introduction We consider an exchangeable

    Poisson model modelizing pump failures with : I s i the number of failures I t i the lenght of time in thousands of hours I ⇢i = si ti the rate
  89. III. 2. A Poisson Model Prior Information We assume that

    : I Y = ( s1 , . . . , s p) I [ s i | i ] = Poisson ( i t i ) I i are idd from (↵, ) with ↵ know and ⇠ IG ( , )
  90. III. 2. A Poisson Model Full Conditional Distributions We have

    that : I [ j | Y , , i6=j ] = (↵ + s j , ( t j + 1 ) 1), j = 1, . . . , p I [ | Y , 1 , . . . , p] ⇠ IG ( + p ↵, P i + )
  91. III. 2. A Poisson Model Gibbs Cycle Given ( (

    0 ) 1 , . . . , ( 0 ) p , ( 0 )), we draw : I ( 1 ) j ⇠ (↵ + s j , ( t j + 1 (0) ) 1) I ( 1 ) ⇠ IG ( + p ↵, P ( 1 ) i + )
  92. III. 2. A Poisson Model Marginal Densities estimated by Gibbs

    Sampling After i iterations with m repetitions : I ( (i) 1 l , . . . , (i) pl , (i) l ) with l = 1, . . . , m I [ j | Y ] = 1 m m X l= 1 (↵ + s j , ( t j + 1 (i) l ) 1) I [ | Y ] = 1 m m X l= 1 IG ( + p ↵, X (i) jl + )
  93. III. 2. A Poisson Model Data I Pump-failure data analyzed

    by Gaver and O’Muircheartaigh (1987) : I p = 10, = 1, = 0.1 I ↵ = ¯ (⇢)2 (S2 ' p 1 ¯ ⇢ P t 1 i )
  94. III. 2. A Poisson Model Results After 10 cycles of

    the algorithm :
  95. III. 2. A Poisson Model Results After 10 cycles of

    the algorithm :
  96. III. 2. A Poisson Model Conclusion I Great fit of

    the Gibbs estimator I Convergence from a small number of drawings (m=10,100)
  97. Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

    Illustrations IV. Conclusion
  98. IV. Conclusion I These sampling algorithms are straightforward to implement

    I Substitution and Gibbs sampling (iterative methods) provide better results in terms of convergence than the Rubin importance-sampling (noniterative method) I Performance of the Rubin importance-sampling performance depends on the choice of the importance distribution I If some reduced conditional distributions are available, substitution sampling becomes more efficient than Gibbs ones.
  99. Thanks for your attention

  100. References Geman, S., and Geman, D. (1984), “Stochastic Relaxation, Gibbs

    Distributions and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741. Glick, N. (1974), “Consistency Conditions for Probability Estimators and Integrals of Density Estimators,” Utilitas Mathematica, 6, 61-74. Rubin, D. B. (1987), Comment on “The Calculation of Posterior Distributions by Data Augmentation,” by M. A. Tanner and W. H. Wong, Journal of the American Statistical Association, 82, 543-546 Tanner,M., and Wong,W. (1987), “The Calculation of Posterior Distributions by Data Augmentation,” Journal of the American Statistical Association, 82, 528-550