Controlling the false discovery rate, by Bing Wong

by Xi'an

Slide 1

Slide 1 text

CONTROLLING THE FALSE DISCOVERY RATE Under the direction of Christian P. Robert Speaker: WANG Bing 25/11/2013

Slide 2

Slide 2 text

CONTROLLING THE FALSE DISCOVERY RATE  The common approach to the multiplicity problem calls for controlling the family wise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses-the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferroni-type procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

Slide 3

Slide 3 text

CONTENTS  Introduction  Authors  Abbreviations  Development of the relevant domain  Development  Definition of False Discovery Rate  Two properties of false discovery rate  False Discovery Rate Controlling Procedure  Example of False Discovery Rate Controlling Procedure  Conclusion

Slide 4

Slide 4 text

AUTHORS  Yoav Benjamini:  His work combines theoretical research in statistical methodology with applied research that involves complex problems with massive data. The methodological work is on selective and simultaneous inference (multiple-comparisons), and centers on the “False Discovery Rate” (FDR) criterion, as well as on general methods for data analysis, data mining and data visualization.  Yosef Hochberg  Ph.D., professor in the School of Mathematical Sciences at Tel Aviv University, has taught several courses in multiple comparisons. He has published an impressive number of articles and technical reports on various statistical methods.

Slide 5

Slide 5 text

ABBREVIATIONS  FWER: familywise error rate  MCPs: multiple-comparison procedures  PCER: per comparison error rate  FDR: false discovery rate---controlling the expected proportion of falsely rejected hypotheses

Slide 6

Slide 6 text

DEVELOPMENT OF THE RELEVANT DOMAIN (RELEVANT SCIENTIST AND THEIR CONTRIBUTIONS )  Connections have been made between the FDR and Bayesian approaches (including empirical Bayes methods),  Storey, John D. (2003). "The positive false discovery rate: A Bayesian interpretation and the q-value"  Generalizing the confidence interval into the False coverage statement rate (FCR)  Benjamini Y, Yekutieli Y (2005). "False discovery rate controlling confidence intervals for selected parameters".  Thresholding wavelets coefficients and model selection  Donoho D, Jin J; Jin (2006). "Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data“

Slide 7

Slide 7 text

STRUCTURE OF THIS STUDY PAPER  A formal definition of the FDR.  Some examples where the control of the FDR is desirable  A simple Bonferroni-type FDR controlling procedure.  A simulation study of the power of the procedure.

Slide 8

Slide 8 text

FALSE DISCOVERY RATE  m: the problem of testing null hypotheses  m0 : true null hypotheses  m-m0 : not true null hypotheses  R: number of hypotheses rejected, an observable random variable  U,V,S and T: unobservable random variables  PCER=E(V/m)  FWER=P(V≥1)

Slide 9

Slide 9 text

DEFINITION OF FALSE DISCOVERY RATE  Proportion of the rejected null hypotheses which are erroneously rejected: Q=V/(V+S)  We define the FDR to be the expectation of Q : e Q     ( ) / ( ) / e Q E Q E V V S E V R    

Slide 10

Slide 10 text

TWO PROPERTIES OF FALSE DISCOVERY RATE:  If all null hypotheses are true, the FDR is equivalent to the FWER:  In this case s=0 and v=r , so if v=0 then Q=0,and if v>0 then Q=1,leading to .  When , the FDR is smaller than or equal to the FWER:  In this case, if v>0 then , leading to Taking expectations on both sides we obtain and the two can be quite different. p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed. It is corresponding to the true null hypotheses and is U(0,1) independent random variables.     1 e P V E Q Q    0 m m  / 1 v r    1 V X Q     1 e P V Q       ( ) / ( ) / e Q E Q E V V S E V R    

Slide 11

Slide 11 text

EXAMPLES 3 examples show the relevance of FDR control in some typical situations:  multiple-comparison problem involves an overall decision  Control of the probability of any error is unnecessarily stringent, as a small proportion of errors will not change the overall validity of the conclusion.  multiple separate decisions without an overall decision being required  Two treatments are compared in multiple subgroups, and separate recommendations on the preferred treatments must be made for all subgroups  multiple potential effects are screened to weed out the null effects  one example is screening of various chemicals for potential drug development.

Slide 12

Slide 12 text

FALSE DISCOVERY RATE CONTROLLING PROCEDURE  The procedure :  Consider testing based on the corresponding p-values, is in ordered.  Define the Bonferroni-type multiple-testing procedure:  Let k be the largest i for that then reject all , i=1,2,…,k q*: maximizes the number of rejections 1 2 , ,..., m H H H   i P   * i i P q m  ( ) i H

Slide 13

Slide 13 text

FALSE DISCOVERY RATE CONTROLLING PROCEDURE  Theorem 1 For independent test statistics and for any configuration of false null hypotheses, the above procedure controls the FDR at .  Remark. The independence of the test statistics corresponding to the false null hypotheses is not needed for the proof of the theorem * q

Slide 14

Slide 14 text

FALSE DISCOVERY RATE CONTROLLING PROCEDURE  For any independent p-values corresponding to true null hypotheses, and for any values that the p-values corresponding to false null hypotheses can take,  FDR controlling procedure: 0 0 m m   1 0 m m m   0 1 * 0 1 1 ( ) ( | ,..., ) m m m m E Q E Q P p P p q m        * * 0 m E Q q q m   Lemma.

Slide 15

Slide 15 text

FALSE DISCOVERY RATE CONTROLLING PROCEDURE  Define the Hochberg’s procedure: Let k be the largest i for that then reject all , i=1,2,…,k  Remark : note the relationship between Hochberg’s procedure and the FDR controlling procedure when q* is chosen to equal α.   * 1 i i P q m i      i H

Slide 16

Slide 16 text

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE  Neuhaus et al.(1992) investigated the effects of a new front- loaded administration of rt-PA versus those obtained with a standard regimen of APSAC, in a randomized multicentre trial in 421 patients with acute myocardial infarction.  rt-PA: Thrombolysis with recombinant tissue-type plasminogen activator  APSAC: anisoylated plasminogen streptokinase activator

Slide 17

Slide 17 text

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE Four families of hypotheses can be identified in the study: 1. Base-line comparisons(11 hypotheses), where the problem is of showing equivalence 2. Patency of infarct-related artery (8 hypotheses) 3. Reocclusion rates of patent infarct-related artery (6 hypotheses) 4. Cardiac and other events after the start of thrombolytic treatment (15 hypotheses)

Slide 18

Slide 18 text

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE  The statement about the mortality is based on a p- value of 0.0095.  The ordered s for the 15 comparisons made are: 0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344, 0.0459, 0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000 .   i p

Slide 19

Slide 19 text

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE  Controlling the FWER at 0.05, the Bonferroni approach, using 0.05/15=0.0033, rejects the 3 hypotheses corresponding to the smallest p-value. (0.0001, 0.0004, 0.0019, correspond to reduced allergic reaction, and to two different aspects of bleeding).  Using Hochberg’s procedure leaves us with the same 3 hypotheses rejected.

Slide 20

Slide 20 text

EXAMPLE OF FALSE DISCOVERY RATE CONTROLLING PROCEDURE  Using the FDR controlling procedure with comparing sequentially each with 0.05i/15, starting with . The first p-value to satisfy the constraint is as thus we reject the 4 hypotheses having p-value which are less than or equal to 0.013.  we may support now with appropriate confidence the statements about mortality decrease, of which we did not have sufficiently strong evidence before. * 0.05 q    i p   15 p   4 p   4 4 0.0095 0.05 0.013 15 p    

Slide 21

Slide 21 text

ANOTHER LOOK AT FDR CONTROLLING PROCEFURE Theorem 2 Choose α that maximizes the number of rejections at this level, r(α) Subject to the constraint αm/ r(α)≤q* (1) Proof: for each α, if P(i) ≤ α ≤ P(i+1) ,then r(α)=i. Furthermore, as the ratio on the left-hand side of constraint (1) increases in α over the range on which r(α) is constant, it is enough to investigate αs which are equal to one of the P(i) s. this α= P(k) satisfies the constraint because α/ r(α)= P(k) /k ≤ q*/m. By considering the largest potential αs first, the procedure yields the α with the largest r(α) satisfying the constraint.

Slide 22

Slide 22 text

POWER COMPARISONS The setting  Using a large simulation study, the family of hypotheses is the expectations of m independent normally distribution random variables being equal to 0.  Each individual hypotheses is tested by z-test, and the test statistics are independent. We use .  The configurations of the hypotheses involve m=4, 8, 16, 32, 64. And the number of truly null hypotheses being 3m/4, m/2, m/4, 0  The non-zero expectations were divided into 4 groups and placed at L/4, L/2, L3/4, and L in the following ways: (a) Linearly Decreasing (D) number of hypotheses of away from 0 in each group (b) Equally (E) number of hypotheses in each group (c) Linearly Increasing (I) number of hypotheses away from 0 in each group  These expectations were fixed (per configuration) throughout the experiment.  The variance of all variables was set to 1, and L was chosen at two levels 5 and 10 * 0.05 q   

Slide 23

Slide 23 text

THE ESTIMATES OF THE AVERAGE POWER  Simulation-based estimates of the average power: the proportion of the false null hypotheses which are correctly rejected.  Comparing the three methods 1. FDR controlling procedure : 2. Hochberg’s : - - - - - - 3. The Bonferroni-type : ……..

Slide 24

Slide 24 text

D E I m power

Slide 25

Slide 25 text

RESULT  The power of all the methods decreases when the number of hypotheses tested increases-this is the cost of multiplicity control.  The power is smallest for the D-configuration, where the non- null hypotheses are closer to the null, and is largest for I.  The power of the FDR controlling method is uniformly larger than that of the other methods.  The advantage increases in m. Therefore, the loss of power as m increases is relatively small for the FDR controlling method in the E- and I-configurations.  The advantage in some situations is extremely large.  Hochberg’s method offers a more powerful alternative to the Bonferroni method.

Slide 26

Slide 26 text

CONCLUSION  The new approach calls for the control of the FDR instead, and thereby also the control of the FWER in the weak sense. In many applications this is the desirable control against errors originating from multiplicity.  this paper focused on presenting and motivating the controlling the FDR, and it can be developed into a simple and powerful procedure. Thus the cost paid for the control of multiplicity need not be large.

Slide 27

Slide 27 text

WANG BING Thanks so much for your attention!