1. 程式人生 > >Holm–Bonferroni method

Holm–Bonferroni method

further exp 方法 gen sage lar native over already

醫藥統計項目聯系:QQ231469242

https://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method

In statistics, the Holm–Bonferroni method[1] (also called the Holm method or Bonferroni-Holm method) is used to counteract the problem of multiple comparisons. It is intended to control the familywise error rate and offers a simple test uniformly more powerful than the Bonferroni correction. It is one of the earliest usages of stepwise algorithms in simultaneous inference. It is named after Sture Holm, who codified the method, and Carlo Emilio Bonferroni.

Contents

  • 1 Motivation
  • 2 Formulation
    • 2.1 Proof
    • 2.2 Alternative proof
  • 3 Example
  • 4 Extensions
    • 4.1 Holm–?idák method
    • 4.2 Weighted version
    • 4.3 Adjusted p-values
  • 5 Alternatives and usage
  • 6 Naming
  • 7 References

Motivation

When considering several hypotheses, the problem of multiplicity arises: the more hypotheses we check, the higher the probability of a Type I error (false positive). The Holm–Bonferroni method is one of many approaches that control the family-wise error rate (the probability that one or more Type I errors will occur) by adjusting the rejection criteria of each of the individual hypotheses or comparisons.

Formulation

The method is as follows:

  • Let
  • Start by ordering the p-values (from lowest to highest)
  • For a given significance level
  • Reject the null hypotheses
  • If

The Holm–Bonferroni method ensures that this method will control the

Proof

Holm-Bonferroni controls the FWER as follows. Let

Let us assume that we wrongly reject a true hypothesis. We have to prove that the probability of this event is at most

So let us define

Alternative proof

The Holm–Bonferroni method can be viewed as closed testing procedure,[2] with Bonferroni method applied locally on each of the intersections of null hypotheses. As such, it controls the familywise error rate for all the k hypotheses at level α in the strong sense. Each intersection is tested using the simple Bonferroni test.

It is a shortcut procedure since practically the number of comparisons to be made equal to

The closure principle states that a hypothesis

In Holm-Bonferroni procedure, we first test

If

The same rationale applies for

The same applies for each

Example

Consider four null hypotheses

Extensions

Holm–?idák method

Further information: ?idák correction

When the hypothesis tests are not negatively dependent, it is possible to replace

resulting in a slightly more powerful test.

Weighted version

Let

Adjusted p-values

The adjusted p-values for Holm–Bonferroni method are:

In the earlier example, the adjusted p-values are

The weighted adjusted p-values are:[citation needed]

A hypothesis is rejected at level α if and only if its adjusted p-value is less than α. In the earlier example using equal weights, the adjusted p-values are 0.03, 0.06, 0.06, and 0.02. This is another way to see that using α = 0.05, only hypotheses one and four are rejected by this procedure.

Alternatives and usage

Main article: Familywise error rate § Controlling procedures

The Holm–Bonferroni method is uniformly more powerful than the classic Bonferroni correction. There are other methods for controlling the family-wise error rate that are more powerful than Holm-Bonferroni.

In the Hochberg procedure, rejection of

A similar step-up procedure is the Hommel procedure.[3]

Naming

Carlo Emilio Bonferroni did not take part in inventing the method described here. Holm originally called the method the "sequentially rejective Bonferroni test", and it became known as Holm-Bonferroni only after some time. Holm‘s motives for naming his method after Bonferroni are explained in the original paper: "The use of the Boole inequality within multiple inference theory is usually called the Bonferroni technique, and for this reason we will call our test the sequentially rejective Bonferroni test."

Bonferroni校正:如果在同一數據集上同時檢驗n個獨立的假設,那麽用於每一假設的統計顯著水平,應為僅檢驗一個假設時的顯著水平的1/n。

簡介

編輯

舉個例子:如要在同一數據集上檢驗兩個獨立的假設,顯著水平設為常見的0.05。此時用於檢驗該兩個假設應使用更嚴格的0.025。即0.05* (1/2)。該方法是由Carlo Emilio Bonferroni發展的,因此稱Bonferroni校正。 這樣做的理由是基於這樣一個事實:在同一數據集上進行多個假設的檢驗,每20個假設中就有一個可能純粹由於概率,而達到0.05的顯著水平。

維基百科原文

編輯 Bonferroni correction Bonferroni correction states that if an experimenter is testing n independent hypotheses on a set of data, then the statistical significance level that should be used for each hypothesis separately is 1/n times what it would be if only one hypothesis were tested. For example, to test two independent hypotheses on the same data at 0.05 significance level, instead of using a p value threshold of 0.05, one would use a stricter threshold of 0.025. The Bonferroni correction is a safeguard against multiple tests of statistical significance on the same data, where 1 out of every 20 hypothesis-tests will appear to be significant at the α = 0.05 level purely due to chance. It was developed by Carlo Emilio Bonferroni. A less restrictive criterion is the rough false discovery rate giving (3/4)0.05 = 0.0375 for n = 2 and (21/40)0.05 = 0.02625 for n = 20. 數據分析中常碰見多重檢驗問題(multiple testing).Benjamini於1995年提出一種方法,是假陽性的.在統計學上,這也就等價於控制FDR不能超過5%. 根據Benjamini在他的文章中所證明的定理,控制fdr的步驟實際上非常簡單。 設總共有m個候選基因,每個基因對應的p值從小到大排列分別是p(1),p(2),...,p(m), The False Discovery Rate (FDR) of a set of predictions is the expected percent of false predictions in the set of predictions. For example if the algorithm returns 100 genes with a false discovery rate of .3 then we should expect 70 of them to be correct. The FDR is very different from ap-value, and as such a much higher FDR can be tolerated than with a p-value. In the example above a set of 100 predictions of which 70 are correct might be very useful, especially if there are thousands of genes on the array most of which are not differentially expressed. In contrast p-value of .3 is generally unacceptabe in any circumstance. Meanwhile an FDR of as high as .5 or even higher might be quite meaningful. FDR錯誤控制法是Benjamini於1995年提出一種方法,通過控 制FDR(False Discovery Rate)來決定P值的域值. 假設你挑選了R個差異表達的基因,其中有S個是真正有差異表達的,另外有V個其實是沒有差異表達的,是假陽性的。實踐中希望錯誤比例Q=V/R平均而言不 能超過某個預先設定的值(比如0.05),在統計學上,這也就等價於控制FDR不能超過5%. 對所有候選基因的p值進行從小到大排序,則若想控制fdr不能超過q,則 只需找到最大的正整數i,使得 p(i)<= (i*q)/m.然後,挑選對應p(1),p(2),...,p(i)的基因做為差異表達基因,這樣就能從統計學上保證fdr不超過q。因此,FDR的計 算公式如下: p-value(i)=p(i)*length(p)/rank(p)

參考文獻

編輯 1.Audic, S. and J. M. Claverie (1997). The significance of digital gene expression profiles. Genome Res 7(10): 986-95. 2.Benjamini, Y. and D. Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics. 29: 1165-1188. 計算方法 請參考 R統計軟件的p.adjust函數: > p<-c(0.0003,0.0001,0.02) > p [1] 3e-04 1e-04 2e-02 > > p.adjust(p,method="fdr",length(p)) [1] 0.00045 0.00030 0.02000 > > p*length(p)/rank(p) [1] 0.00045 0.00030 0.02000 > length(p) [1] 3 > rank(p) [1] 2 1 3 sort(p) [1] 1e-04 3e-04 2e-02[1]
參考資料

Holm–Bonferroni method