National Taiwan University Slides by Liang Bo Wang 42 Biometrics, March 2010 The score function derived from the logged pairwise pseudo- likelihood can be expressed as i< k I(t Yik , u Yik ) × − exp{ρik (t, u) β} 1 + exp{ρik (t, u) β} ρik (t, u) dNi (t) dNk (u), where Yik = Yi ∧ Yk . Recall that Di and Dk denote the ob- served data of the ith and the kth subject. Define the function h(Di , Dk ; β) = I(t Yik , u Yik ) × − exp{ρik (t, u) β} 1 + exp{ρik (t, u) β} ρik (t, u) dNi (t) dNk (u), and denote the score function by Sp (β) = 1 (n 2 ) i< k h(Di , Dk ; β). It is easy to see that h is permutation symmetric in its ar- guments and Sp is a U-statistic with the kernel h(·, ·). If β is the true parameter value, it can be shown that the score function Sp (β) = 0. Applying the projection method devel- oped by Hoeffding (1948) and under the assumption that X(t) is bounded by M, we can show that score function √ nSp (β) converges to a normal distribution with mean 0 and variance– covariance V 1 = 4E{h(D1, D2 ; β) h(D1, D3 ; β) }. We then study the asymptotic properties of ˆ β using delta method. The large sample properties of ˆ β are stated in Theorem 1, with proofs given in the Appendix. Theorem 1: Assume that X(t) is bounded by M and E[N(τ)] < ∞. Let ˆ β be the solution of Sp (β) = 0. Then ˆ β is a consistent estimator of β. Furthermore, √ n(ˆ β − β) → N(0, V ), where V = V −1 2 V 1 V −1 2 with V 1 = 4 E{h(D1 , D2 ; β) h(D1 , D3 ; β) } and V 2 = −E{∂h(D1 , D2 ; β)/∂β}. Note that the variance covariance matrix V can be esti- mated by ˆ V −1 2 ˆ V 1 ˆ V −1 2 , where ˆ V 1 = 4 n n i=1 1 n −1 2 i< j < k h(Di , Dj ; ˆ β)h(Di , Dk ; ˆ β) , and ˆ V 2 = − 1 (n 2 ) i< k ∂h(Di , Dk ; ˆ β) ∂β . Also note that based on the score function Sp , a test statistic for testing the hypothesis β = 0 can be formulated based on 1 (n 2 ) i< k Ni (Yik )Nk (Yik ) Yi k 0 {Xi (u) − Xk (u)} × dNi (u) Ni (u) − dNk (u) Nk (u) . Interestingly, this test statistic for the effects of time- dependent covariates does not require information about time-independent covariates, and hence is useful for check- ing proportionality assumption in the usual proportional rate model. 3.2 Estimation of Λ0 and γ By definition πi (t) can be considered as the distribution function of a biased sample from the distribution F (t) = Λ0 (t)/Λ0 (τ), where the observations are sampled with a prob- ability proportional to exp{Xi (t) β}. It is easy to see that, under the assumption (1), the conditional likelihood (2) is computationally equivalent to the likelihood of a set of inde- pendent random variables, where the data are a biased sample from distribution function F(t) with sampling weight propor- tional to exp{Xi (t) β} and are right truncated by Yi . Thus event times in the risk set are observed with different sam- pling probabilities, where the probabilities are proportional to exp{Xi (t) β}. If β is known, the probability structure of the risk set can be recovered by using the inverse probability weighting technique. Following the spirit of the Breslow esti- mator of the cumulative hazard in the Cox model, we modify the truncation product-limit estimator (Wang et al., 1986) as follows to estimate F by assigning each event in the risk set a weight that is proportional to the inverse of the sampling weight function: sl > t 1 − dl (β) Rl (β) , where dl (β) = 1 n n i=1 m i j =1 I(tij = sl ) exp{−Xi (tij ) β} and Rl (β) = 1 n n i=1 m i j =1 I(tij sl Yi ) exp{−Xi (tij ) β}. Note that β = 0 implies that the assigned weight is a unit weight and the proposed estimator reduces to the product-limit esti- mator that maximizes the nonparametric likelihood function for truncated data. By replacing β with ˆ β, we estimate F with ˆ F (t; ˆ β) = sl > t 1 − dl (ˆ β) Rl (ˆ β) . The large sample properties of ˆ F (t) are stated in Lemma 1, with proofs given in the Appendix. Lemma 1: Assume that (a) Λ0 (τ) > 0, (b) P r(Y > τ, Z > 0) > 0, and (c) G(u) = E{ZI(Y u) exp(W γ)} is a con- tinuous function for u ∈ [0, τ]. For inf{y : Λ0 (y) > 0} < t τ, √ n{ ˆ F (t) − F (t)} converges weakly to a normal distribution with mean 0 and variance 4F (t)2 E{κ(D1 , D2 ; t, β) κ(D1 , D3 ; t, β)}, where κ is defined in the Appendix. To estimate the regression parameters of time-independent covariates, we note that, conditioning on {Zi , Yi , W i , Xi (Yi )}, the expected value of mi is given by E{mi |Zi , Yi , W i , Xi (Yi )} = Zi Λ0 (τ) exp{W i γ} I(Yi u) exp{Xi (u) β} dF (u). Thus, following the assumption E[Zi | W i , Xi (τ)] = 1 and by double expectation, we have E mi Yi 0 exp{Xi (u) β} dF (u) | W i = Λ0 (τ) exp(W i γ). Recurrent Event Data Analysis 41 Assumption (1) implies that the occurrence of recurrent events follows a proportional intensity model, where the unobserved frailty Zi inflates/deflates the intensity. Because of the memoryless property of a Poisson process, conditional on Zi , the rate function equals the intensity function of the re- current event process. Under (1) and E[Zi | W i , Xi (τ)] = 1, the rate function of event occurrence at time t in a random population is given by λ0 (t) exp{Xi (t) β + W i γ}. Thus (1) implies the proportional rate model for recurrent event data studied by Lin et al. (2000) and many others. The proposed model also reduces to the semiparametric model studied in Wang et al. (2001) in the absence of time-dependent covari- ate Xi (t), and is in line with the model for case series data studied in Farrington and Whitaker (2006) in the absence of time-independent covariate Wi . 3. Estimation Procedure 3.1 Estimation of β We denote by mi the number of recurrent events that occurred before time Yi and ti1, . . . , tim i the observed event times for subject i. For ease of notation, we use mi and tij , i = 1, 2, . . . , n, j = 1, 2, . . . , mi , to denote either random variables or realized values. Let Di denote the observed data of the ith sub- ject, that is, Di = {(Yi , W i , Xi (Yi ), mi , (ti1, . . . , tim i )}. As- sume that {(Zi , Xi (·), W i , Yi , Ni (·)); i = 1, . . . , n} are independent and identically distributed (i.i.d.), so that the {D1 , . . . , Dn } are also i.i.d. For estimation of the regression parameter β, an initial at- tempt might apply the conditional technique used in Wang et al. (2001) to eliminate nuisance parameters from the likeli- hood. Under (1) the event times (ti1, . . . , tim i ) of the ith sub- ject conditional on (Zi , Yi , mi , W i , X(Yi )) are order statistics of a set of i.i.d. random variables with the density function Zi λ0 (t) exp{Xi (t) β + W i γ} Yi 0 Zi λ0 (u) exp{Xi (t) β + W i γ}du , 0 t Yi . Note that both the unobserved frailty Zi and the time- independent covariates Wi are eliminated from the condi- tional density function. The conditional likelihood based on all subjects is proportional to n i=1 m i j =1 λ0 (t) exp{Xi (tij ) β} Yi 0 λ0 (u) exp{Xi (u) β}du = n i=1 m i j =1 dπi (tij ) πi (Yi ) , (2) where dπi (t) is the shape function of the recurrent event pro- cess given by dπi (t) = λ0 (t) exp{Xi (t) β} τ 0 λ0 (u) exp{Xi (u) β}du I(0 t τ), and πi (t) = t 0 dπi (u). Note that πi defines a proper distribu- tion function with πi (τ) = 1. When the recurrent event model only includes time- independent covariates, the conditional density function fur- ther reduces to λ0 (t)/Λ0 (Yi ), which is in the form of a trun- cated density function. It is easy to see that for this special case the conditional likelihood depends only on λ0 and is com- putationally equivalent to the nonparametric likelihood of in- dependently right-truncated data. Hence the reduced condi- tional likelihood is maximized by the product-limit estimator for independently right-truncated data (Wang, Jewell, and Tsai, 1986). In the presence of time-varying covariates, how- ever, the conditional likelihood (2) involves both the paramet- ric component Xi (t) β and the nonparametric component λ0 . Maximizing the conditional likelihood function is a challenge because the integral in the denominator of the conditional likelihood does not have a closed form with λ0 unspecified. Motivated by Liang and Qin (2000) and Kalbfleisch (1978), we propose an alternative estimation procedure for β that does not involve the nonparametric component λ0 , and hence has the advantage of computational convenience. Because (2) is computationally equivalent to the semipara- metric likelihood of a set of independently right-truncated random variables, we can reformulate the problem as esti- mating the regression parameter β using the data {tij , i = 1, . . . , n, j = 1, . . . , mi }, where tij is an observed event time with the distribution function πi and is subject to inde- pendent right truncation Yi . The pairwise pseudolikelihood method considered by Liang and Qin (2000), however, can- not be applied directly to truncated data: event times are not necessarily comparable because they are subject to different truncation times. The observation of tij is subject to the con- straint tij Yi , hence any two event times tij and tkl , i = k, are comparable if tij belongs to the observation interval of tkl and tkl belongs to the observation interval of tij . These con- straints amount to tij Yi ∧ Yk and tkl Yi ∧ Yk , where ∧ denotes minimum. For any two event times tij and tkl , let δijkl = 1 if (tij , tkl ) is a comparable pair, and 0 otherwise. We condition on hav- ing observed the values {tij , tkl } for a given pair, but without knowing the order. We refer to this as conditioning on the order statistics of (tij , tkl ). The conditional distribution is de- generate at the observed values if (tij , tkl ) are not comparable. By conditioning on the order statistics of (tij , tkl ) and δijkl = 1, the pairwise pseudolikelihood of (tij , tkl ), i < k, is given by dπi (tij )dπk (tk l ) dπi (tij )dπk (tk l ) + dπi (tk l )dπk (tij ) = exp[{Xi (tij ) + Xk (tk l )} β] exp[{Xi (tij ) + Xk (tk l )} β + exp[{Xi (tk l ) + Xk (tij )} β] = 1 1 + exp{ρik (tij , tk l ) β} , (3) where ρik (t, u) = Xi (u) + Xk (t) − Xi (t) − Xk (u). Interest- ingly, the pairwise pseudolikelihood depends on the regression parameter β but not the nonparametric component λ0 . Hence β can be estimated by maximizing the pairwise pseudolikeli- hood i< k m i j =1 m k l=1 1 1 + exp{ρik (tij , tk l ) β} δi j k l . Lots of MATH and PROOF!! GOAL! R Package for Use!