Xi'an
November 04, 2013
1.8k

# Reading Gibbs Sampling by Gelfand & Smith

slides of Guillaume Reveillon presenting the historical paper on Gibbs sampling

#### Xi'an

November 04, 2013

## Transcript

1. ### Sampling-Based Approaches to Calculating Marginal Densities Alan E. Gelfand; Adrian

F. Smith Journal of the American Statistical Association, Vol. 85, No. 410. (Jun., 1990), pp. 398-409 November 4, 2013
2. ### Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

Illustrations IV. Conclusion
3. ### Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

Illustrations IV. Conclusion

6. ### I. 1. Purposes I Exploitation of structural information to obtain

numerical estimates of non analytically available marginal densities I Using of sampling methods instead of implementing sophisticated numerical analytic ones I More attractable because of their simplicity and ease of implementation
7. ### I. 1. Purposes I Exploitation of structural information to obtain

numerical estimates of non analytically available marginal densities I Using of sampling methods instead of implementing sophisticated numerical analytic ones I More attractable because of their simplicity and ease of implementation
8. ### I. 1. Purposes I Exploitation of structural information to obtain

numerical estimates of non analytically available marginal densities I Using of sampling methods instead of implementing sophisticated numerical analytic ones I More attractable because of their simplicity and ease of implementation

10. ### I. 2. Context Two cases I A) For i =

1, . . . , k the conditional distributions U i | U j ( j 6= i ) are available. I We can also consider the reduced forms Ui | Uj where j 2 Si ⇢ { 1 , . . . , k } I B) The functional form of the joint density of U1 , U2 , . . . , Uk is known and at least one U i | U j ( j 6= i ) is available
11. ### I. 2. Context Two cases I A) For i =

1, . . . , k the conditional distributions U i | U j ( j 6= i ) are available. I We can also consider the reduced forms Ui | Uj where j 2 Si ⇢ { 1 , . . . , k } I B) The functional form of the joint density of U1 , U2 , . . . , Uk is known and at least one U i | U j ( j 6= i ) is available
12. ### I. 2. Context Two cases I A) For i =

1, . . . , k the conditional distributions U i | U j ( j 6= i ) are available. I We can also consider the reduced forms Ui | Uj where j 2 Si ⇢ { 1 , . . . , k } I B) The functional form of the joint density of U1 , U2 , . . . , Uk is known and at least one U i | U j ( j 6= i ) is available
13. ### Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

Illustrations IV. Conclusion
14. ### II. Sampling Approaches Assumptions I Dealing with real random variables

having a joint distribution whose density function is strictly positive over the sample space I Full set of conditional speciﬁcations uniquely deﬁnes the full joint density I Existence of densities with respect to either Lebesgue or counting measures for all marginal and conditional distributions
15. ### II. Sampling Approaches Assumptions I Dealing with real random variables

having a joint distribution whose density function is strictly positive over the sample space I Full set of conditional speciﬁcations uniquely deﬁnes the full joint density I Existence of densities with respect to either Lebesgue or counting measures for all marginal and conditional distributions
16. ### II. Sampling Approaches Assumptions I Dealing with real random variables

having a joint distribution whose density function is strictly positive over the sample space I Full set of conditional speciﬁcations uniquely deﬁnes the full joint density I Existence of densities with respect to either Lebesgue or counting measures for all marginal and conditional distributions
17. ### II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
18. ### II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
19. ### II. 1. Substitution Algorithm Introduction I Standard mathematical tool used

in ﬁnding ﬁxed-pointed solutions to certain classes of integral equations I Its utility in statistical problems was developed by Tanner and Wong (1987) who called it a data-augmentation algorithm
20. ### II. 1. Substitution Algorithm Introduction I Standard mathematical tool used

in ﬁnding ﬁxed-pointed solutions to certain classes of integral equations I Its utility in statistical problems was developed by Tanner and Wong (1987) who called it a data-augmentation algorithm
21. ### II. 1. Substitution Algorithm Two-variable case / Tanner and Wong

‘ development (1) [ X ] = ´ [ X | Y ] ⇤ [ Y ] (2) [ Y ] = ´ [ Y | X ] ⇤ [ X ] (3) [ X ] = ´ [ X | Y ] ⇤ ´ [ Y | X 0] ⇤ [ X 0] = ´ h ( X , X 0) ⇤ [ X 0] Where h ( X , X 0) = ´ [ X | Y ] ⇤ [ Y | X 0] With X 0 a dummy argument and [ X 0] s [ X ]
22. ### II. 1. Substitution Algorithm Two-variable case / Tanner and Wong

‘ development (1) [ X ] = ´ [ X | Y ] ⇤ [ Y ] (2) [ Y ] = ´ [ Y | X ] ⇤ [ X ] (3) [ X ] = ´ [ X | Y ] ⇤ ´ [ Y | X 0] ⇤ [ X 0] = ´ h ( X , X 0) ⇤ [ X 0] Where h ( X , X 0) = ´ [ X | Y ] ⇤ [ Y | X 0] With X 0 a dummy argument and [ X 0] s [ X ]
23. ### II. 1. Substitution Algorithm Two-variable case / Algorithm I [

X 0] replaced by [ X ]i I [ X ]i+ 1 = ´ h ( X , X 0) ⇤ [ X ]i = Ih[ X ]h I Ih = integral operator associated with h
24. ### II. 1. Substitution Algorithm Two-variable case / Algorithm I [

X 0] replaced by [ X ]i I [ X ]i+ 1 = ´ h ( X , X 0) ⇤ [ X ]i = Ih[ X ]h I Ih = integral operator associated with h
25. ### II. 1. Substitution Algorithm Main properties Theorem TW1 (uniqueness). The

true marginal density, [ X ] , is the unique solution to (3)
26. ### II. 1. Substitution Algorithm Main properties Theorem TW2 (convergence). For

almost any [ X ] 0, the sequence [ X ] 1 , [ X ] 2 , . . . deﬁned by [ X ]i+ 1 = Ih[ X ]i converges monotonically in L1 to [ X ]
27. ### II. 1. Substitution Algorithm Main properties Theorem TW3 (rate). ´

| [ X ]i [ X ] |! 0 geometrically in i
28. ### II. 1. Substitution Algorithm Extension case X , Y and

Z , three random variables : (4) [ X ] = ´ [ X , Z | Y ] ⇤ [ Y ] (5) [ Y ] = ´ [ Y , X | Z ] ⇤ [ Z ] (6) [ Z ] = ´ [ Z , Y | X ] ⇤ [ X ] By substitution : I [ X ] = ´ h ( X , X 0) ⇤ [ X 0] I Where h ( X , X 0) = ´ [ X , Z | Y ] ⇤ [ Y , X | Z ] ⇤ [ Z , Y | X 0] I With X 0 a dummy argument and [ X 0] s [ X ] Extension to k variables is straightforward.
29. ### II. 1. Substitution Algorithm Extension case X , Y and

Z , three random variables : (4) [ X ] = ´ [ X , Z | Y ] ⇤ [ Y ] (5) [ Y ] = ´ [ Y , X | Z ] ⇤ [ Z ] (6) [ Z ] = ´ [ Z , Y | X ] ⇤ [ X ] By substitution : I [ X ] = ´ h ( X , X 0) ⇤ [ X 0] I Where h ( X , X 0) = ´ [ X , Z | Y ] ⇤ [ Y , X | Z ] ⇤ [ Z , Y | X 0] I With X 0 a dummy argument and [ X 0] s [ X ] Extension to k variables is straightforward.
30. ### II. 1. Substitution Algorithm Extension case X , Y and

Z , three random variables : (4) [ X ] = ´ [ X , Z | Y ] ⇤ [ Y ] (5) [ Y ] = ´ [ Y , X | Z ] ⇤ [ Z ] (6) [ Z ] = ´ [ Z , Y | X ] ⇤ [ X ] By substitution : I [ X ] = ´ h ( X , X 0) ⇤ [ X 0] I Where h ( X , X 0) = ´ [ X , Z | Y ] ⇤ [ Y , X | Z ] ⇤ [ Z , Y | X 0] I With X 0 a dummy argument and [ X 0] s [ X ] Extension to k variables is straightforward.
31. ### II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
32. ### II. 2. Substitution Sampling Two-variable case / Assumptions I [

X | Y ] and [ Y | X ] are available I [ X ] 0 is an arbitrary (possibly degenerate) initial density
33. ### II. 2. Substitution Sampling Two-variable case / Assumptions I [

X | Y ] and [ Y | X ] are available I [ X ] 0 is an arbitrary (possibly degenerate) initial density
34. ### II. 2. Substitution Sampling Two-variable case / Algorithm Cycle (i)

Draw a single X ( 0 ) from [ X ] 0 (ii) Draw Y ( 1 ) ⇠ [ Y | X ( 0 )] (iii) Complete this cycle by drawing X ( 1) ⇠ [ X | Y ( 1 )]
35. ### II. 2. Substitution Sampling Two-variable case / Convergence Repetition of

this cycle produces, after i iterations, the pair ( X (i), Y (i)) such that : I X (i) d ! X ⇠ [ X ] and Y (i) d ! Y ⇠ [ Y ]
36. ### II. 2. Substitution Sampling Two-variable case / Convergence Repetition of

this cycle produces, after i iterations, the pair ( X (i), Y (i)) such that : I X (i) d ! X ⇠ [ X ] and Y (i) d ! Y ⇠ [ Y ]
37. ### II. 2. Substitution Sampling Two-variable case / Estimator If at

each iteration i we generate m iid pairs ( X (i) j , Y (i) j ) ( j 2 {1, . . . , m } , i 2 {1, . . . , k }), we can estimate [ X ] with the Monte Carlo integration : h ˆ X i i = 1 m m X j= 1 h X | Y (i) j i
38. ### II. 2. Substitution Sampling Two-variable case / Estimator If at

each iteration i we generate m iid pairs ( X (i) j , Y (i) j ) ( j 2 {1, . . . , m } , i 2 {1, . . . , k }), we can estimate [ X ] with the Monte Carlo integration : h ˆ X i i = 1 m m X j= 1 h X | Y (i) j i
39. ### II. 2. Substitution Sampling Two-variable case / L1 Convergence L1

convergence of h ˆ X i i to [ X ] since : ˆ | h ˆ X i i [ X ] | ˆ | h ˆ X i i [ X ]i | + ˆ | [ X ]i [ X ] | and I h ˆ X i i P ! [ X ]i when m ! 1 (Glick 1974) I [ X ]i L1 ! [ X ] when i ! 1 (TW2)
40. ### II. 2. Substitution Sampling Extension case I Extension to more

than two variables is straightforward. I In the three-variable-case with an arbitrary starting marginal density [ X ] 0 for X : I X (0) ⇠ [ X ]0 I ( Z (0)0 , Y (0)0 ) ⇠ ⇥ Z , Y | X (0) ⇤ I ( Y (1), X (0)0 ) ⇠ h Y , X | Z (0)0 i I ( X (1), Z (1)) ⇠ ⇥ X , Z | Y (1) ⇤ I Six generated variables required
41. ### II. 2. Substitution Sampling Extension case I Repeating this cycle

i times produces ( X (i), Y (i), Z (i)) such that : I X (i) d ! X ⇠ [ X ] , Y (i) d ! Y ⇠ [ Y ] and Z (i) d ! Z ⇠ [ Z ] I At the i th iteration for m generations : I ( X (i) j , Y (i) j , Z (i) j ) iid samples I h ˆ X i i = 1 m m X j=1 h X | Y (i) j , Z (i) j i I The L1 convergence still follows
42. ### II. 2. Substitution Sampling Extension case For k variables ,

U1 , . . . , Uk, : I k ( k 1) random variate generations to complete one cycle I mik ( k 1) random generations for m sequences an i iterations I h ˆ U s i i = 1 m m X j= 1 h U s | U t = U (i) tj ; t 6= s i I The L1 convergence still follows
43. ### II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
44. ### II. 3. Gibbs Sampling Introduction I Introduced by Geman and

Geman (1984) to simulate marginal densities using full conditional distributions I [ X | Y , Z ] , [ Y | X , Z ] and [ Z | X , Y ] in the three-variable-case
45. ### II. 3. Gibbs Sampling Introduction I Gibbs sampler was introduced

by Geman and Geman (1984) to simulate marginal densities without using all conditional distributions (just the full ones) I [ X | Y , Z ] , [ Y | X , Z ] and [ Z | X , Y ] in the three-variable-case
46. ### II. 3. Gibbs Sampling Gibbs scheme I A Markovian updating

scheme I With an arbitrary starting set of values U ( 0 ) 1 , . . . , U ( 0 ) k : I U (1) 1 ⇠ h U1 | U (0) 2 , . . . , U (0) k i I U (1) 2 ⇠ h U2 | U (1) 1 , U (0) 3 , . . . , U (0) k i I U (1) 3 ⇠ h U3 | U (1) 1 , U (1) 2 , U (0) 4 , . . . , U (0) k i I . . . I U (1) k ⇠ h Uk | U (1) 1 , . . . , U (1) k 1 i I K random variate generations required in a cycle I Afer i iterations =) ( U (i) 1 , . . . , U (i) k )
47. ### II. 3. Gibbs Sampling Gibbs scheme I A Markovian updating

scheme I With an arbitrary starting set of values U ( 0 ) 1 , . . . , U ( 0 ) k : I U (1) 1 ⇠ h U1 | U (0) 2 , . . . , U (0) k i I U (1) 2 ⇠ h U2 | U (1) 1 , U (0) 3 , . . . , U (0) k i I U (1) 3 ⇠ h U3 | U (1) 1 , U (1) 2 , U (0) 4 , . . . , U (0) k i I . . . I U (1) k ⇠ h Uk | U (1) 1 , . . . , U (1) k 1 i I K random variate generations required in a cycle I Afer i iterations =) ( U (i) 1 , . . . , U (i) k )
48. ### II. 3. Gibbs Sampling Gibbs scheme I A Markovian updating

scheme I With an arbitrary starting set of values U ( 0 ) 1 , . . . , U ( 0 ) k : I U (1) 1 ⇠ h U1 | U (0) 2 , . . . , U (0) k i I U (1) 2 ⇠ h U2 | U (1) 1 , U (0) 3 , . . . , U (0) k i I U (1) 3 ⇠ h U3 | U (1) 1 , U (1) 2 , U (0) 4 , . . . , U (0) k i I . . . I U (1) k ⇠ h Uk | U (1) 1 , . . . , U (1) k 1 i I K random variate generations required in a cycle I Afer i iterations =) ( U (i) 1 , . . . , U (i) k )
49. ### II. 3. Gibbs Sampling Main properties Theorem GG1 (convergence) (

U (i) 1 , . . . , U (i) k ) d ! [ U1 , . . . , Uk] and hence for each s, U (i) s d ! U s ⇠ [ U s] as i ! 1
50. ### II. 3. Gibbs Sampling Main properties Theorem GG2 (rate) Using

the sup norm, rather than the L1 norm, the joint density of ( U (i) 1 , . . . , U (i) k ) converges to the true joint density at a geometric rate in i.
51. ### II. 3. Gibbs Sampling Main properties Theorem GG3 (ergodic theorem)

For any measurable function T of U1 , . . . , Uk whose expectation exists, lim i!1 1 i i X l= 1 T ⇣ U (l) 1 , . . . , U (l) k ⌘ a.s. ! E ( T (( U1 , . . . , Uk))
52. ### II. 3. Gibbs Sampling Estimator The density estimate for [

U s] is given by : h ˆ U s i i = 1 m m X j= 1 h U s | U t = U (i) tj ; t 6= s i
53. ### II. 3. Gibbs Sampling Substitution versus Gibbs I Diﬀerences beetwen

these two samplers : Substitution Gibbs Conditional distributions all full ones Variables generated k(k-1) k
54. ### II. 3. Gibbs Sampling Substitution versus Gibbs I Substitution and

Gibbs sampler are equivalent when only the set of full conditionnals is available. I If reduced conditional distributions are available, substitution sampling oﬀers the possibility of acceleration relative to Gibbs sampling.
55. ### II. 3. Gibbs Sampling Substitution versus Gibbs Example a) Y

( 0 )0 ⇠ ⇥ Y | X ( 0 ), Z ( 0 ) ⇤ b) Z ( 0 )0 ⇠ h X | Y ( 0 )0 , X ( 0 ) i c) X ( 0 )0 ⇠ h Z | Z ( 0 )0 , Y ( 0 )0 i d) Y ( 1 ) ⇠ h Y | X ( 0 )0 , Z ( 0 )0 i e) Z ( 1 ) ⇠ h Z | Y ( 1 ), X ( 0 )0 i f) X ( 1 ) ⇠ ⇥ X | Z ( 1 ), Y ( 1 ) ⇤ If [ Z | Y ] is available, e) becomes Z ( 1 ) ⇠ ⇥ Z | Y ( 1 ) ⇤
56. ### II. 3. Gibbs Sampling Substitution versus Gibbs Example a) Y

( 0 )0 ⇠ ⇥ Y | X ( 0 ), Z ( 0 ) ⇤ b) Z ( 0 )0 ⇠ h X | Y ( 0 )0 , X ( 0 ) i c) X ( 0 )0 ⇠ h Z | Z ( 0 )0 , Y ( 0 )0 i d) Y ( 1 ) ⇠ h Y | X ( 0 )0 , Z ( 0 )0 i e) Z ( 1 ) ⇠ h Z | Y ( 1 ), X ( 0 )0 i f) X ( 1 ) ⇠ ⇥ X | Z ( 1 ), Y ( 1 ) ⇤ If [ Z | Y ] is available, e) becomes Z ( 1 ) ⇠ ⇥ Z | Y ( 1 ) ⇤
57. ### II. Sampling Approaches 1. Substitution Algorithm 2. Substitution Sampling 3.

Gibbs Sampling 4. Rubin Importance-Sampling Algorithm
58. ### II. 4. Rubin Importance-Sampling Algorithm Introduction I A noniterative Monte

Carlo method for generating marginal distributions using importance-sampling ideas
59. ### II. 4. Rubin Importance-Sampling Algorithm Introduction Fact Monte Carlo integration

Problem Jh = Ef [ h ( X )] = ˆ H h ( x ) f ( x ) dx MC solution ¯ h m = 1 m m X i= 1 h ( x i ) with ( x1 , . . . , x m) ⇠ f
60. ### II. 4. Rubin Importance-Sampling Algorithm Introduction Fact Importance Idea Problem

Jh = E g  h ( X )f ( X ) g ( X ) = ˆ H  h ( x )f ( x ) g ( x ) g ( x ) dx MC solution ¯ h m = 1 m m X i= 1 h ( x i )f ( x i ) g ( x i ) with ( x1 , . . . , x m) ⇠ g
61. ### II. 4. Rubin Importance-Sampling Algorithm Two-variable case / Assumptions I

The functional form (modulo the normalizing constant), of the joint density [ X , Y ] is known I the conditional distribution [ X | Y ] is available
62. ### II. 4. Rubin Importance-Sampling Algorithm Two-variable case / Assumptions I

The functional form (modulo the normalizing constant), of the joint density [ X , Y ] is known I The conditional distribution [ X | Y ] is available
63. ### II. 4. Rubin Importance-Sampling Algorithm Two-variable case / Rubin’s idea

I Choose an importance-sampling distribution [ Y ]s for Y I Use [ X | Y ] ⇤ [ Y ]s as an importance-sampling distribution for ( X , Y ) I ( Xl , Yl ) is created by drawing Yl ⇠ [ Y ]s and Xl ⇠ [ X | Yl ] ( l = 1 , . . . , N ) I Calculate rl = [ Xl , Yl ] [ Xl | Yl ] ⇤ [ Yl ]s
64. ### II. 4. Rubin Importance-Sampling Algorithm Two-variable case I The estimator

of the marginal density [ X ] is : h ˆ X i = PN l= 1 [ X | Yl ] ⇤ rl PN l= 1 rl
65. ### II. 4. Rubin Importance-Sampling Algorithm Main property By dividing the

numerator and the denominator by N and using the law of large number, we obtain : Theorem R1 (convergence) h ˆ X i ! [ X ] with probability 1 as N ! 1 for almost every X
66. ### II. 4. Rubin Importance-Sampling Algorithm Extension case / Assumptions In

the three-variable case, we need : I The functional form of [ X , Y , Z ] I The availability of [ X | Y , Z ] I An importance- sampling distribution [ Y , Z ]s
67. ### II. 4. Rubin Importance-Sampling Algorithm Extension case / Estimator Then,

we have : I rl = [ Xl , Yl , Zl ] [ Xl | Yl , Zl ] ⇤ [ Yl , Zl ]s I h ˆ X i = PN l= 1 [ X | Yl , Zl ] ⇤ rl PN l= 1 rl
68. ### II. 4. Rubin Importance-Sampling Algorithm Extension case In the k-variable

case : I Nk variables are generated I Extension is also straightforward
69. ### Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

Illustrations IV. Conclusion
70. ### III. Examples and Numerical Illustrations 1. A Multinomial Model 2.

A Conjugate Hierarchical Model
71. ### III. Examples and Numerical Illustrations 1. A Multinomial Model 2.

A Conjugate Hierarchical Model
72. ### III. 1. A Multinomial Model Motivations I Two-parameter version of

a one-parameter genetic-linkage example I Some observations are not assigned to individual cells but to aggregates of cells : =) we use multinomial sampling
73. ### III. 1. A Multinomial Model Motivations I Two-parameter version of

a one-parameter genetic-linkage example I Some observations are not assigned to individual cells but to aggregates of cells : =) we use multinomial sampling
74. ### III. 1. A Multinomial Model Data and Prior information I

Y = ( Y1 , . . . , Y5 ) ⇠ mult ( n , a1 ✓ + b1 , a2 ✓ + b2 , a3 ⌘ + b3 , a4 ⌘ + b4 , c (1 ✓ ⌘)) I where ai , bi 0 are known, I 0 < c = 1 P bi = a1 + a2 = a3 + a4 < 1 I ✓, ⌘ 0 , ✓ + ⌘  1 I (✓, ⌘) ⇠ Dirichlet (↵ 1 , ↵ 2 , ↵ 3 )
75. ### III. 1. A Multinomial Model Data and Prior information I

Y = ( Y1 , . . . , Y5 ) ⇠ mult ( n , a1 ✓ + b1 , a2 ✓ + b2 , a3 ⌘ + b3 , a4 ⌘ + b4 , c (1 ✓ ⌘)) I where ai , bi 0 are known, I 0 < c = 1 P bi = a1 + a2 = a3 + a4 < 1 I ✓, ⌘ 0 , ✓ + ⌘  1 I (✓, ⌘) ⇠ Dirichlet (↵ 1 , ↵ 2 , ↵ 3 )
76. ### III. 1. A Multinomial Model Unobservable Data and Posterior Information

I X = ( X1 , . . . , X9 ) ⇠ mult ( n , a1 ✓, b1 , a2 ✓, b2 , a3 ⌘, b3 , a4 ⌘, b4 , c (1 ✓ ⌘)) I (✓, ⌘ | X ) ⇠ Dirichlet ( X1 + X3 + ↵ 1 , X5 + X7 + ↵ 2 , X9 + ↵ 3 ) I [✓ | X , ⌘] and [⌘ | X , ✓] are available as scaled beta distributions on [0, 1 ⌘] and [0, 1 ✓]
77. ### III. 1. A Multinomial Model Authors’ trick If we let

: Y1 = X1 + X2 , Y2 = X3 + X4 , Y3 = X5 + X6 , Y4 = X7 + X8 and Y5 = X9 Z = ( X1 , X3 , X5 , X7 ) =) studying X is equivalent to studying ( Y , Z )
78. ### III. 1. A Multinomial Model Authors’ trick If we let

: Y1 = X1 + X2 , Y2 = X3 + X4 , Y3 = X5 + X6 , Y4 = X7 + X8 and Y5 = X9 Z = ( X1 , X3 , X5 , X7 ) =) studying X is equivalent to studying ( Y , Z )
79. ### III. 1. A Multinomial Model Studied case I So, we

have a three-variable case (✓, ⌘, Z ) with interest in the marginal distributions : I [✓ | Y ] , [⌘ | Y ] and [ Z | Y ] I We can remark that [ Z | Y , ✓, ⌘] is the product of four independent binomials X1 , X3 , X5 and X7 =)[ X i | Y , ✓, ⌘] = binomial ( Y i , ai ✓ (ai ✓+bi ) )
80. ### III. 1. A Multinomial Model Data Used Y = (14,

1, 1, 1, 5) ⇠ mult (22, 1 4 ✓ + 1 8 , 1 4 ⌘, 1 4 ⌘ + 3 8 , 1 2 (1 ✓ ⌘)) X = ( X1 , . . . , X7 ) ⇠ mult (22, 1 4 ✓, 1 8 , 1 4 ⌘, 1 4 ⌘, 3 8 , 1 2 (1 ✓ ⌘)) Z = ( X1 , X5 )
81. ### III. 1. A Multinomial Model Substitution and Gibbs Sampling To

compare the two forms of iterative sampling, the authors : I obtained a numerical estimates of [✓ | Y ] and [⌘ | Y ] I processed 5,000 times the following scheme : I initialize : ✓ ⇠ U ( 0 , 1 ), ⌘ ⇠ U ( 0 , 1 ), 0  ✓ + ⌘  1 I run 4 cycles of the two samplers with m = 10 I Compared the average cumulative posterior probabilities

83. ### III. 1. A Multinomial Model Substitution and Gibbs Sampling We

note that : I Substitution sampler adapts more quickly than Gibbs sampler I By the time, the two samplers have the same performance I Few random variate generations are required to obtain convergence ( m=10)
84. ### III. 1. A Multinomial Model Rubin Importance-Sampling The Rubin importance-sampling

requires : I [ Z | Y ]s to draw Zl I [⌘ | Y , Z ] to draw ⌘l I [✓ | ⌘, Z , Y ] to draw ✓l The rl ratio is given by : rl = [ Y , Zl | ✓l , ⌘l ] ⇤ [✓l , ⌘l ] [✓ | ⌘, Z , Y ] ⇤ [⌘ | Y , Z ] ⇤ [ Z | Y ]
85. ### III. 1. A Multinomial Model Rubin Importance-Sampling I The authors

obtained the following average cumulative posterior probabilities with: I 2,500 simulations I [ Z | Y ]s following the product of X1 ⇠ binomial ( Y1, 1 2 ) and X5 ⇠ binomial ( Y4, 1 2 )
86. ### III. 1. A Multinomial Model Rubin Importance-Sampling We note that

: I The estimation is rather poor I The result may be sensitive to the choice of the importance distribution
87. ### III. Examples and Numerical Illustrations 1. A Multinomial Model 2.

A Poisson Model
88. ### III. 2. A Poisson Model Introduction We consider an exchangeable

Poisson model modelizing pump failures with : I s i the number of failures I t i the lenght of time in thousands of hours I ⇢i = si ti the rate
89. ### III. 2. A Poisson Model Prior Information We assume that

: I Y = ( s1 , . . . , s p) I [ s i | i ] = Poisson ( i t i ) I i are idd from (↵, ) with ↵ know and ⇠ IG ( , )
90. ### III. 2. A Poisson Model Full Conditional Distributions We have

that : I [ j | Y , , i6=j ] = (↵ + s j , ( t j + 1 ) 1), j = 1, . . . , p I [ | Y , 1 , . . . , p] ⇠ IG ( + p ↵, P i + )
91. ### III. 2. A Poisson Model Gibbs Cycle Given ( (

0 ) 1 , . . . , ( 0 ) p , ( 0 )), we draw : I ( 1 ) j ⇠ (↵ + s j , ( t j + 1 (0) ) 1) I ( 1 ) ⇠ IG ( + p ↵, P ( 1 ) i + )
92. ### III. 2. A Poisson Model Marginal Densities estimated by Gibbs

Sampling After i iterations with m repetitions : I ( (i) 1 l , . . . , (i) pl , (i) l ) with l = 1, . . . , m I [ j | Y ] = 1 m m X l= 1 (↵ + s j , ( t j + 1 (i) l ) 1) I [ | Y ] = 1 m m X l= 1 IG ( + p ↵, X (i) jl + )
93. ### III. 2. A Poisson Model Data I Pump-failure data analyzed

by Gaver and O’Muircheartaigh (1987) : I p = 10, = 1, = 0.1 I ↵ = ¯ (⇢)2 (S2 ' p 1 ¯ ⇢ P t 1 i )
94. ### III. 2. A Poisson Model Results After 10 cycles of

the algorithm :
95. ### III. 2. A Poisson Model Results After 10 cycles of

the algorithm :
96. ### III. 2. A Poisson Model Conclusion I Great ﬁt of

the Gibbs estimator I Convergence from a small number of drawings (m=10,100)
97. ### Outline I. Introduction II. Sampling Approaches III. Examples and Numerical

Illustrations IV. Conclusion
98. ### IV. Conclusion I These sampling algorithms are straightforward to implement

I Substitution and Gibbs sampling (iterative methods) provide better results in terms of convergence than the Rubin importance-sampling (noniterative method) I Performance of the Rubin importance-sampling performance depends on the choice of the importance distribution I If some reduced conditional distributions are available, substitution sampling becomes more eﬃcient than Gibbs ones.

100. ### References Geman, S., and Geman, D. (1984), “Stochastic Relaxation, Gibbs

Distributions and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741. Glick, N. (1974), “Consistency Conditions for Probability Estimators and Integrals of Density Estimators,” Utilitas Mathematica, 6, 61-74. Rubin, D. B. (1987), Comment on “The Calculation of Posterior Distributions by Data Augmentation,” by M. A. Tanner and W. H. Wong, Journal of the American Statistical Association, 82, 543-546 Tanner,M., and Wong,W. (1987), “The Calculation of Posterior Distributions by Data Augmentation,” Journal of the American Statistical Association, 82, 528-550