for all j. We will ensure that these knockoff ˜ X> ˜ X = ⌃, X> ˜ X = ⌃ diag{s}, egative vector. In words, ˜ X exhibits the same covariance structure a orrelations between distinct original and knockoff variables are the s and ⌃ diag{s} are equal on off-diagonal entries): X> j ˜ Xk = X> j Xk for all j 6= k. j to its knockoff ˜ Xj , we see that X> j ˜ Xj = ⌃jj sj = 1 sj, ensure that our method has good statistical power to detect signals, we as large as possible so that a variable Xj is not too similar to its knock X is to choose s 2 Rp + satisfying diag{s} 2⌃, and construct the n ⇥ ˜ X = X(I ⌃ 1 diag{s}) + ˜ UC; matrix that is orthogonal2 to the span of the features X, and C>C = w ࣍ͷͭͷ݅Λຬͨ͢Α͏ʹɹɹΛߏங 9Λ֬มͱΈ͍ͯΔ
w 8ͷཁ݅ ରশੑͷΈʹͳͬͨ
will allow for FDR control. truct the knockoffs, we first calculate the Gram matrix ⌃ = X>X of the origin eature such that ⌃jj = kXj k2 2 = 1 for all j. We will ensure that these knockoff f ˜ X> ˜ X = ⌃, X> ˜ X = ⌃ diag{s}, onal nonnegative vector. In words, ˜ X exhibits the same covariance structure as ion, the correlations between distinct original and knockoff variables are the sa because ⌃ and ⌃ diag{s} are equal on off-diagonal entries): X> j ˜ Xk = X> j Xk for all j 6= k. eature Xj to its knockoff ˜ Xj , we see that X> j ˜ Xj = ⌃jj sj = 1 sj, j = 1. To ensure that our method has good statistical power to detect signals, we ntries of s as large as possible so that a variable Xj is not too similar to its knock ructing ˜ X is to choose s 2 Rp + satisfying diag{s} 2⌃, and construct the n ⇥ ˜ X = X(I ⌃ 1 diag{s}) + ˜ UC; honormal matrix that is orthogonal2 to the span of the features X, and C>C = (X, ˜ X)swap(S) d = (X, ˜ X); (3.1) ponse Y . (2) is guaranteed if ˜ X is constructed without looking at Y . wap(S) is obtained from (X, ˜ X) by swapping the entries Xj and ˜ Xj for each and S = {2, 3}, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3}) d = (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3). al and knocko↵ variables are pairwise exchangeable: taking any subset of with their knocko↵s leaves the joint distribution invariant. Note that our n the covariates, and thus bears little resemblance to exchangeability condi- esting (see, e.g., Westfall and Troendle (2008)). To give an example of MX N(0, ⌃). Then a joint distribution obeying (3.1) is this: N(0, G), where G = ⌃ ⌃ diag{s} ⌃ diag{s} ⌃ ; (3.2) matrix selected in such a way that the joint covariance matrix G is positive bution obtained by swapping variables with their knocko↵s is Gaussian with dom variables X = (X1, . . . , Xp) are a new th the following two properties: (1) for any subset S ⇢ {1, . . . , p},4 (X, ˜ X)sw (2) ˜ X ? ? Y | X if there is a response Y . (2) is gu Above, the vector (X, ˜ X)swap(S) is obtained f j 2 S; for example, with p = 3 and S = {2, 3}, (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swa We see from (3.1) that original and knocko↵ va variables and swapping them with their knocko↵ exchangeability condition is on the covariates, an tions for closed permutation testing (see, e.g., W knocko↵s, suppose that X ⇠ N(0, ⌃). Then a jo (X, ˜ X) ⇠ N(0, G), where subset S ⇢ {1, . . . , p},4 (X, ˜ X)swap(S) d = (X, ˜ X); (2) ˜ X ? ? Y | X if there is a response Y . (2) is guaranteed if ˜ X is constructed without loo Above, the vector (X, ˜ X)swap(S) is obtained from (X, ˜ X) by swapping the entries X j 2 S; for example, with p = 3 and S = {2, 3}, (X1, X2, X3, ˜ X1, ˜ X2, ˜ X3)swap({2,3}) d = (X1, ˜ X2, ˜ X3, ˜ X1, X2, X3). We see from (3.1) that original and knocko↵ variables are pairwise exchangeable: tak variables and swapping them with their knocko↵s leaves the joint distribution invarian exchangeability condition is on the covariates, and thus bears little resemblance to exch tions for closed permutation testing (see, e.g., Westfall and Troendle (2008)). To give a knocko↵s, suppose that X ⇠ N(0, ⌃). Then a joint distribution obeying (3.1) is this: (X, ˜ X) ⇠ N(0, G), where G = ⌃ ⌃ diag{s} ⌃ diag{s} ⌃ ; here, diag{s} is any diagonal matrix selected in such a way that the joint covariance ma semidefinite. Indeed, the distribution obtained by swapping variables with their knocko↵ istics relevant variables, we now compute statistics Wj for each j 2 {1, . . . , p}, a la providing evidence against the hypothesis that Xj is null. This statistic depends original variables but also on the knocko↵s; that is, Wj = wj([X, ˜ X], y) As in Barber and Cand` es (2015), we impose a flip-sign property, which says that sw with its knocko↵ has the e↵ect of changing the sign of Wj. Formally, if [X, ˜ X]swap d by swapping columns in S, wj([X, ˜ X]swap(S), y) = ( wj([X, ˜ X], y), j 62 S, wj([X, ˜ X], y), j 2 S. (3 rementioned work, we do not require the su ciency property that wj depend on gh [X, ˜ X]>[X, ˜ X] and [X, ˜ X]> y. may help the reader unfamiliar with the knocko↵ framework to think about knock . . , Wp) in two steps: first, consider a statistic T for each original and knocko↵ variab T , (Z, ˜ Z) = (Z1, . . . , Zp, ˜ Z1, . . . , ˜ Zp) = t([X, ˜ X], y),