CONTROLLING THE FALSE DISCOVERY RATE The common approach to the multiplicity problem calls for controlling the family wise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses-the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferroni-type procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

CONTENTS Introduction Authors Abbreviations Development of the relevant domain Development Definition of False Discovery Rate Two properties of false discovery rate False Discovery Rate Controlling Procedure Example of False Discovery Rate Controlling Procedure Conclusion

AUTHORS Yoav Benjamini: His work combines theoretical research in statistical methodology with applied research that involves complex problems with massive data. The methodological work is on selective and simultaneous inference (multiple-comparisons), and centers on the “False Discovery Rate” (FDR) criterion, as well as on general methods for data analysis, data mining and data visualization. Yosef Hochberg Ph.D., professor in the School of Mathematical Sciences at Tel Aviv University, has taught several courses in multiple comparisons. He has published an impressive number of articles and technical reports on various statistical methods.

DEVELOPMENT OF THE RELEVANT DOMAIN (RELEVANT SCIENTIST AND THEIR CONTRIBUTIONS ) Connections have been made between the FDR and Bayesian approaches (including empirical Bayes methods), Storey, John D. (2003). "The positive false discovery rate: A Bayesian interpretation and the q-value" Generalizing the confidence interval into the False coverage statement rate (FCR) Benjamini Y, Yekutieli Y (2005). "False discovery rate controlling confidence intervals for selected parameters". Thresholding wavelets coefficients and model selection Donoho D, Jin J; Jin (2006). "Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data“

STRUCTURE OF THIS STUDY PAPER A formal definition of the FDR. Some examples where the control of the FDR is desirable A simple Bonferroni-type FDR controlling procedure. A simulation study of the power of the procedure.

FALSE DISCOVERY RATE m: the problem of testing null hypotheses m0 : true null hypotheses m-m0 : not true null hypotheses R: number of hypotheses rejected, an observable random variable U,V,S and T: unobservable random variables PCER=E(V/m) FWER=P(V≥1)

DEFINITION OF FALSE DISCOVERY RATE Proportion of the rejected null hypotheses which are erroneously rejected: Q=V/(V+S) We define the FDR to be the expectation of Q : e Q ( ) / ( ) / e Q E Q E V V S E V R

TWO PROPERTIES OF FALSE DISCOVERY RATE: If all null hypotheses are true, the FDR is equivalent to the FWER: In this case s=0 and v=r , so if v=0 then Q=0,and if v>0 then Q=1,leading to . When , the FDR is smaller than or equal to the FWER: In this case, if v>0 then , leading to Taking expectations on both sides we obtain and the two can be quite different. p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed. It is corresponding to the true null hypotheses and is U(0,1) independent random variables. 1 e P V E Q Q 0 m m / 1 v r 1 V X Q 1 e P V Q ( ) / ( ) / e Q E Q E V V S E V R

EXAMPLES 3 examples show the relevance of FDR control in some typical situations: multiple-comparison problem involves an overall decision Control of the probability of any error is unnecessarily stringent, as a small proportion of errors will not change the overall validity of the conclusion. multiple separate decisions without an overall decision being required Two treatments are compared in multiple subgroups, and separate recommendations on the preferred treatments must be made for all subgroups multiple potential effects are screened to weed out the null effects one example is screening of various chemicals for potential drug development.

FALSE DISCOVERY RATE CONTROLLING PROCEDURE The procedure : Consider testing based on the corresponding p-values, is in ordered. Define the Bonferroni-type multiple-testing procedure: Let k be the largest i for that then reject all , i=1,2,…,k q*: maximizes the number of rejections 1 2 , ,..., m H H H i P * i i P q m ( ) i H

FALSE DISCOVERY RATE CONTROLLING PROCEDURE Theorem 1 For independent test statistics and for any configuration of false null hypotheses, the above procedure controls the FDR at . Remark. The independence of the test statistics corresponding to the false null hypotheses is not needed for the proof of the theorem * q

FALSE DISCOVERY RATE CONTROLLING PROCEDURE For any independent p-values corresponding to true null hypotheses, and for any values that the p-values corresponding to false null hypotheses can take, FDR controlling procedure: 0 0 m m 1 0 m m m 0 1 * 0 1 1 ( ) ( | ,..., ) m m m m E Q E Q P p P p q m * * 0 m E Q q q m Lemma.

FALSE DISCOVERY RATE CONTROLLING PROCEDURE Define the Hochberg’s procedure: Let k be the largest i for that then reject all , i=1,2,…,k Remark : note the relationship between Hochberg’s procedure and the FDR controlling procedure when q* is chosen to equal α. * 1 i i P q m i i H

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE Neuhaus et al.(1992) investigated the effects of a new front- loaded administration of rt-PA versus those obtained with a standard regimen of APSAC, in a randomized multicentre trial in 421 patients with acute myocardial infarction. rt-PA: Thrombolysis with recombinant tissue-type plasminogen activator APSAC: anisoylated plasminogen streptokinase activator

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE Four families of hypotheses can be identified in the study: 1. Base-line comparisons(11 hypotheses), where the problem is of showing equivalence 2. Patency of infarct-related artery (8 hypotheses) 3. Reocclusion rates of patent infarct-related artery (6 hypotheses) 4. Cardiac and other events after the start of thrombolytic treatment (15 hypotheses)

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE The statement about the mortality is based on a p- value of 0.0095. The ordered s for the 15 comparisons made are: 0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344, 0.0459, 0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000 . i p

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE Controlling the FWER at 0.05, the Bonferroni approach, using 0.05/15=0.0033, rejects the 3 hypotheses corresponding to the smallest p-value. (0.0001, 0.0004, 0.0019, correspond to reduced allergic reaction, and to two different aspects of bleeding). Using Hochberg’s procedure leaves us with the same 3 hypotheses rejected.

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE Using the FDR controlling procedure with comparing sequentially each with 0.05i/15, starting with . The first p-value to satisfy the constraint is as thus we reject the 4 hypotheses having p-value which are less than or equal to 0.013. we may support now with appropriate confidence the statements about mortality decrease, of which we did not have sufficiently strong evidence before. * 0.05 q i p 15 p 4 p 4 4 0.0095 0.05 0.013 15 p

ANOTHER LOOK AT FDR CONTROLLING PROCEFURE Theorem 2 Choose α that maximizes the number of rejections at this level, r(α) Subject to the constraint αm/ r(α)≤q* (1) Proof: for each α, if P(i) ≤ α ≤ P(i+1) ,then r(α)=i. Furthermore, as the ratio on the left-hand side of constraint (1) increases in α over the range on which r(α) is constant, it is enough to investigate αs which are equal to one of the P(i) s. this α= P(k) satisfies the constraint because α/ r(α)= P(k) /k ≤ q*/m. By considering the largest potential αs first, the procedure yields the α with the largest r(α) satisfying the constraint.

POWER COMPARISONS The setting Using a large simulation study, the family of hypotheses is the expectations of m independent normally distribution random variables being equal to 0. Each individual hypotheses is tested by z-test, and the test statistics are independent. We use . The configurations of the hypotheses involve m=4, 8, 16, 32, 64. And the number of truly null hypotheses being 3m/4, m/2, m/4, 0 The non-zero expectations were divided into 4 groups and placed at L/4, L/2, L3/4, and L in the following ways: (a) Linearly Decreasing (D) number of hypotheses of away from 0 in each group (b) Equally (E) number of hypotheses in each group (c) Linearly Increasing (I) number of hypotheses away from 0 in each group These expectations were fixed (per configuration) throughout the experiment. The variance of all variables was set to 1, and L was chosen at two levels 5 and 10 * 0.05 q

THE ESTIMATES OF THE AVERAGE POWER Simulation-based estimates of the average power: the proportion of the false null hypotheses which are correctly rejected. Comparing the three methods 1. FDR controlling procedure : 2. Hochberg’s : - - - - - - 3. The Bonferroni-type : ……..

RESULT The power of all the methods decreases when the number of hypotheses tested increases-this is the cost of multiplicity control. The power is smallest for the D-configuration, where the non- null hypotheses are closer to the null, and is largest for I. The power of the FDR controlling method is uniformly larger than that of the other methods. The advantage increases in m. Therefore, the loss of power as m increases is relatively small for the FDR controlling method in the E- and I-configurations. The advantage in some situations is extremely large. Hochberg’s method offers a more powerful alternative to the Bonferroni method.

CONCLUSION The new approach calls for the control of the FDR instead, and thereby also the control of the FWER in the weak sense. In many applications this is the desirable control against errors originating from multiplicity. this paper focused on presenting and motivating the controlling the FDR, and it can be developed into a simple and powerful procedure. Thus the cost paid for the control of multiplicity need not be large.