Department of Mathematics, University of Bergen, P. O. Box 7800, N-5020 Bergen, Norway

Department of Global Public Health and Primary Care, University of Bergen, P. O. Box 7804, N-5018 Bergen, Norway

Abstract

Background

The rank correlation test introduced by Begg and Mazumdar is extensively used in meta-analysis to test for publication bias in clinical and epidemiological studies. It is based on correlating the standardized treatment effect with the variance of the treatment effect using Kendall’s tau as the measure of association. To our knowledge, the operational characteristics regarding the significance level of the test have not, however, been fully assessed.

Methods

We propose an alternative rank correlation test to improve the error rates of the original Begg and Mazumdar test. This test is based on the simulated distribution of the estimated measure of association, conditional on sampling variances. Furthermore, Spearman’s rho is suggested as an alternative rank correlation coefficient. The attained level and power of the tests are studied by simulations of meta-analyses assuming the fixed effects model.

Results

The significance levels of the original Begg and Mazumdar test often deviate considerably from the nominal level, the null hypothesis being rejected too infrequently. It is proven mathematically that the assumptions for using the rank correlation test are not strictly satisfied. The pairs of variables fail to be independent, and there is a correlation between the standardized effect sizes and sampling variances under the null hypothesis of no publication bias. In the meta-analysis setting, the adverse consequences of a false negative test are more profound than the disadvantages of a false positive test. Our alternative test improves the error rates in fixed effects meta-analysis. Its significance level equals the nominal value, and the Type II error rate is reduced. In small data sets Spearman’s rho should be preferred to Kendall’s tau as the measure of association.

Conclusions

As the attained significance levels of the test introduced by Begg and Mazumdar often deviate greatly from the nominal level, modified rank correlation tests, improving the error rates, should be preferred when testing for publication bias assuming fixed effects meta-analysis.

Background

Meta-analysis is a systematic procedure for assessing and combining statistical information based on results of available independent studies regarding the same topic. In recent years, meta-analytic methods have become increasingly popular in various fields of medicine. Results from meta-analysis are subject to criticism for many reasons, an important concern being possible small study effects such as publication bias. Publication bias arises when the published studies relevant for inclusion in a meta-analysis do not represent all studies of the problem of interest

In particular, studies that are less likely to get published appear to be the less conclusive ones

Traditionally, funnel plots

Several authors have provided formal and objective tests for publication bias. Egger et al.

Asymmetry in funnel plots may also, however, occur due to heterogeneity. Statistical heterogeneity is present when the true effects being evaluated vary between studies, and this underlying heterogeneity may be detectable if the variation between the studies is above that expected by chance

The regression based approach proposed by Egger et al.

Number of yearly cites for the Begg and Mazumdar article

**Number of yearly cites for the Begg and Mazumdar article.**

Concerns have been expressed, however, about the possible lack of power of both tests

The significance level of the Begg and Mazumdar test is attained when there is no selection bias present in the meta-analysis. Begg and Mazumdar carried out simulations corresponding to such situations. They did not, however, include the results in their paper but merely stated that “In all cases the nominal significance level was less than 5%”

Description of the Begg and Mazumdar test

Suppose that a meta-analysis consists of _{1},_{2},…,_{
k
} and _{1},_{2},…,_{
k
} denote the estimated effect sizes and sampling variances from these studies. As the effect sizes are not identically distributed under the null hypothesis of no publication bias, Begg and Mazumdar _{
i
},

Here,

is the standard weighted average of the effect sizes and

is the variance of

Begg and Mazumdar

The test involves evaluating **
t
**

a variable which is asymptotically

This article is organized as follows. In the Methods section we suggest an algorithm intended to improve the error rates of the Begg and Mazumdar test for publication bias. Additionally, this section outlines the simulation procedure used to study and compare the new algorithm to the original test in fixed effects meta-analysis. In the Results section we explain why the Begg and Mazumdar method has a poor significance level. The performance of the adjusted test is assessed and compared to the results of the original test. Examples are given. The Discussion section includes an overall evaluation of the rank correlation tests for publication bias presented in our paper and is followed by the Conclusion section.

Methods

Improvement of the Begg and Mazumdar test: Method and algorithm

We would like to develop a test based on rank correlation making it easy to adjust the actual significance level in the case of a normally distributed outcome variable. This can be done employing the simulated distribution of the estimated measure of association, conditional on the sampling variances. The following algorithm summarizes the procedure:

Given _{1},_{2},…,_{
k
}, and their variances, _{1},_{2},…,_{
k
}.

1. For each replication, indexed

(a) Generate

(b) Correlate _{1},_{2},…,_{
k
} by computing Kendall’s tau,

2. Determine the intervals of rejection, e.g., by finding the percentiles based on the empirical distribution of

3. Correlate the standardized effects, _{1},_{2},…,_{
k
}, and compute Kendall’s tau,

4. Reject the null hypothesis of no publication bias if

We denote this the adjusted Begg and Mazumdar test. The R code for the adjusted procedure is provided in Additional file

**R code for the adjusted Begg and Mazumdar test based on Kendall’s tau.** The R code provides the mid-

Click here for file

A drawback with this procedure is that we condition on the sampling variances, and we may possibly develop better methods if this is not done. In addition we assume that the estimated effect sizes are normally distributed. This may not always be the case in studies with a small sample size, and in particular, effect sizes are not normally distributed if the outcome is binary. Kendall’s tau is scale invariant. Hence the adjusted Begg and Mazumdar method will still work well if the variances are systematically underestimated. It should be noted that the simulation procedure itself in steps 1 and 2 does not depend on the observed values _{1},_{2},…,_{
k
} of the random variables involved, but only on the fixed variances. Thus the procedure is not a bootstrap in the ordinary sense.

Spearman’s rho versus Kendall’s tau as the measure of association

The Begg and Mazumdar test uses Kendall’s tau as the measure of association. It requires evaluation of the test statistic

In order to improve the level, one may alternatively apply the mid-

where _{
i
}=_{
i
}−_{
i
} is the difference between the ranks of observation

Simulation procedure

Simulations are needed in order to study the significance level of the Begg and Mazumdar test and to examine the operational characteristics of our new algorithm and compare to the original test. We apply the simulation procedure introduced by Begg and Mazumdar, and a detailed description is given in the following subsections.

Study selection

Begg and Mazumdar **
t
** is normal, i.e.

When a particular value of _{
i
} has been generated, the study is published (included in the meta-analysis) with probability given by an appropriate weight function. Begg and Mazumdar use different weight functions. The weight function depending on the

with suitably defined constants

The main objective in conducting a meta-analysis is to estimate the true underlying effect,

Scenarios

Following the scenarios of Begg and Mazumdar, we considered two values for

Selection mechanism

**Selection mechanism.** The weight function for selecting studies for inclusion in the meta-analysis as a function of the

We generated each simulated meta-analysis in the following way. An effect size was randomly generated from a normal distribution having one of the variances under study. Its mean was the true, underlying effect. The probability of selection for inclusion in the meta-analysis was calculated by the relevant selection model. We chose the model based on the

Results

Attained significance level and power of the original Begg and Mazumdar test based on Kendall’s tau

We first performed simulations in the situation without publication bias, in order to control the results of the Begg and Mazumdar method. The estimated significance level of their test for publication bias found by us is shown in Tables

**Level**

**[% selected for inclusion, bias]**

**Range of variances**

**Large†**

**Small‡**

*

†

Values deviating significantly from the nominal level 5.00% over the 5000 simulations (using a 5% level in the binomial test) are typed in boldface.

Treatment effect (

.0

**1.72%**

**3.96%**

[100%,.00]

[100%,.00]

.5

**1.82%**

**4.36%**

[100%,.00]

[100%,.00]

1.0

**1.86%**

**4.30%**

[100%,.00]

[100%,.00]

1.5

**1.90%**

**3.68%**

[100%,.00]

[100%,.00]

2.0

**1.82%**

**3.58%**

[100%,.00]

[100%, -.00]

2.5

**1.54%**

4.48%

[100%,.00]

[100%,.00]

3.0

**1.74%**

**4.24%**

[100%,.00]

[100%, -.00]

**Level**

**[% selected for inclusion, bias]**

**Range of variances**

**Large†**

**Small‡**

*

†

Values deviating significantly from the nominal level 5.00% over the 5000 simulations (using a 5% level in the binomial test) are typed in boldface.

Treatment effect (

.0

**1.76%**

**4.12%**

[100%, -.00]

[100%,.00]

.5

**1.70%**

4.74%

[100%,.00]

[100%, -.00]

1.0

**2.38%**

4.54%

[100%,.00]

[100%,.00]

1.5

**1.96%**

**4.30%**

[100%,.00]

[100%, -.00]

2.0

**1.60%**

**4.24%**

[100%,.00]

[100%, -.00]

2.5

**1.88%**

**4.10%**

[100%,.00]

[100%,.00]

3.0

**1.64%**

**4.22%**

[100%,.00]

[100%,.00]

Our results confirm the impression that the attained significance level does not exceed the nominal one, but overall the deviations between the two values can be considerable. The range of variances is obviously an important factor influencing the significance level of the test. When the spread of variances is large (

All simulations were undertaken using R

In the simulation procedure presented by Begg and Mazumdar, the variances are treated as fixed constants. Additional simulations were carried out employing random variances drawn from a suitable distribution. As an example, for

Power estimates were found employing the simulation procedure described in Methods. The simulations were restricted to one-sided selection depending on the

The power of the Begg and Mazumdar test is shown in Tables

**Power**

**[% selected for inclusion, bias]**

**Selection strength**

**Strong****

**Moderate*****

**Range of variances**

**Large†**

**Small‡**

**Large†**

**Small‡**

*

**

†

Treatment effect (

.0

57%

22%

33%

13%

[36%,.34]

[37%,.74]

[57%,.25]

[57%,.54]

.5

51%

21%

23%

11%

[54%,.16]

[52%,.54]

[74%,.09]

[73%,.34]

1.0

39%

16%

13%

8%

[65%,.07]

[67%,.36]

[82%,.04]

[85%,.20]

1.5

27%

13%

9%

6%

[72%,.05]

[80%,.23]

[87%,.02]

[92%,.10]

2.0

19%

8%

5%

5%

[78%,.03]

[88%,.14]

[90%,.02]

[96%,.05]

2.5

12%

6%

3%

4%

[82%,.02]

[93%,.07]

[93%,.01]

[98%,.03]

3.0

9%

5%

3%

4%

[86%,.02]

[96%,.04]

[94%,.01]

[99%,.01]

**Power**

**[% selected for inclusion, bias]**

**Selection strength**

**Strong****

**Moderate*****

**Range of variances**

**Large†**

**Small ‡**

**Large†**

**Small‡**

*

**

†

Treatment effect (

.0

99%

61%

88%

38%

[36%,.34]

[36%,.74]

[56%,.24]

[56%,.54]

.5

99%

59%

77%

31%

[53%,.16]

[52%,.54]

[74%,.09]

[72%,.34]

1.0

94%

50%

54%

21%

[64%,.07]

[67%,.36]

[82%,.04]

[84%,.19]

1.5

85%

35%

35%

12%

[71%,.04]

[79%,.23]

[86%,.02]

[92%,.10]

2.0

71%

22%

21%

7%

[77%,.03]

[88%,.13]

[90%,.02]

[96%,.05]

2.5

53%

12%

13%

5%

[81%,.02]

[93%,.07]

[92%,.01]

[98%,.03]

3.0

40%

7%

8%

5%

[85%,.02]

[96%,.04]

[94%,.01]

[99%,.01]

Additional simulations were carried out using a nominal level of 0.10. When the range of variances is large, the attained significance levels are roughly half the nominal, i.e., about 0.05. The significance level is estimated at 0.0498 when

Begg and Mazumdar justified the two ranges of variances used in their simulations by considering relevant values in published studies. In the context of meta-analysis, however, the variances take more than three different values, and their choices of variances do not represent a realistic distribution. Separate simulations confirmed that the distribution of the variances influences the results concerning the significance level and power, but these results are nevertheless not in conflict with our general assessments and conclusions. As an example, the simulation results when generating meta-analyses consisting of 25 component studies, of which 13 studies have the largest variance (

**Power for the original Begg and Mazumdar test, employing a somewhat more realistic distribution of the variances for small meta-analyses.** The meta-analyses are generated consisting of 13 component studies having the largest variance (

Click here for file

An explanation of the poor significance level

We compute the conditional covariance between two standardized effects given the variances. Straightforward calculations give

where

We assume that the _{1},_{1}),(_{2},_{2}),…,(_{
k
},_{
k
}), are independent and have the same bivariate distribution. As a consequence, _{
i
} is independent of _{
j
} given _{1},_{2},…,_{
k
}. It follows that the first term inside the square brackets equals zero. For the same reason,

The last term equals

It readily follows that

Because the variables

It has therefore been mathematically proven that the standardized treatment effects given the variances fail to be independent. As a result, the standardized treatment effects are not independent even if the variances are regarded as random variables. Furthermore, there is a correlation between the standardized effect sizes and sampling variances under the null hypothesis of no publication bias. It follows that the vectors **
t
**

Our calculations are consistent with a remark given by Begg

Attained significance level and power for the adjusted Begg and Mazumdar test

The effect sizes are standardized by Begg and Mazumdar to obtain a set of estimates that can be assumed to be independent and identically distributed under the null hypothesis of no publication bias

In the context of potential publication bias in meta-analysis, the adverse consequences of a false negative test are much more profound than those of a false positive test

We assessed the properties of the adjusted rank correlation test, first using Kendall’s tau as our test statistic. The adjusted interval of rejection for the test statistic was found in any particular situation with fixed variances using only the first two steps in the algorithm, choosing

When

**Power**

**[% selected for inclusion, bias]**

**Selection strength**

**Strong****

**Moderate*****

**Range of variances**

**Large†**

**Small‡**

**Large†**

**Small‡**

*

**

†

Treatment effect (

.0

73%

24%

48%

16%

[36%,.34]

[37%,.74]

[57%,.25]

[57%,.54]

.5

69%

23%

37%

14%

[54%,.16]

[52%,.54]

[74%,.09]

[73%,.35]

1.0

56%

20%

25%

10%

[65%,.07]

[67%,.37]

[82%,.04]

[85%,.20]

1.5

44%

15%

17%

7%

[72%,.05]

[80%,.23]

[87%,.02]

[92%,.10]

2.0

32%

10%

12%

6%

[78%,.03]

[88%,.13]

[90%,.02]

[96%,.05]

2.5

24%

7%

9%

5%

[82%,.02]

[93%,.07]

[93%,.01]

[98%,.03]

3.0

19%

6%

7%

5%

[86%,.02]

[97%,.04]

[94%,.01]

[99%,.01]

**Power for the adjusted Begg and Mazumdar test based on Kendall’s tau for large meta-analyses (****
k
**

Click here for file

Figure

Histograms of the simulated distribution of

**Histograms of the simulated distribution of**** under the null hypothesis along with kernal density estimates.** These are compared to the density of the asymptotic distribution of Kendall’s tau. **A**: small meta-analyses (**B**: small meta-analyses (

Additional simulations were performed choosing

**Power for both the original Begg and Mazumdar test and the adjusted procedure when applying even stronger selection strength (****
a
**

Click here for file

As already explained, it may be reasonable to choose a higher significance level than 0.05, and simulations were carried out for the adjusted method at a 0.10 significance level. These additional simulations were also performed when the meta-analyses comprised

**Power**

**[% selected for inclusion, bias]**

**Selection strength**

**Strong****

**Moderate*****

**Range of variances**

**Large†**

**Small‡**

**Large†**

**Small‡**

*

**

†

Treatment effect (

.0

83%

36%

63%

23%

[36%,.34]

[37%,.74]

[57%,.25]

[57%,.54]

.5

80%

34%

53%

22%

[54%,.16]

[52%,.54]

[74%,.09]

[73%,.34]

1.0

70%

30%

39%

17%

[65%,.07]

[67%,.36]

[82%,.04]

[85%,.20]

1.5

59%

23%

27%

13%

[72%,.05]

[80%,.23]

[87%,.03]

[92%,.11]

2.0

48%

18%

21%

10%

[78%,.03]

[88%,.13]

[90%,.02]

[96%,.05]

2.5

37%

13%

17%

10%

[82%,.02]

[93%,.08]

[93%,.01]

[98%,.03]

3.0

30%

10%

14%

10%

[86%,.02]

[96%,.04]

[94%,.01]

[99%,.01]

Many meta-analyses include much less than 25 studies. For that reason, we performed additional simulations in the Begg and Mazumdar setting to assess the actual significance level of the adjusted test as

What if we use Spearman’s rho instead of Kendall’s tau as the measure of association? Additional simulations then demonstrate that

**Power**

**[% selected for inclusion, bias]**

**Selection strength**

**Strong****

**Moderate*****

**Range of variances**

**Large†**

**Small‡**

**Large†**

**Small‡**

*

**

†

Treatment effect (

.0

74%

24%

52%

16%

[36%,.34]

[37%,.74]

[57%,.25]

[57%,.54]

.5

69%

23%

39%

14%

[54%,.16]

[52%,.54]

[74%,.09]

[73%,.34]

1.0

57%

20%

26%

10%

[65%,.07]

[67%,.37]

[82%,.04]

[85%,.20]

1.5

44%

15%

17%

7%

[72%,.05]

[80%,.23]

[87%,.03]

[92%,.10]

2.0

34%

10%

12%

5%

[78%,.03]

[88%,.13]

[90%,.02]

[96%,.05]

2.5

25%

7%

9%

5%

[82%,.02]

[93%,.07]

[93%,.01]

[98%,.03]

3.0

19%

6%

8%

5%

[86%,.02]

[97%,.04]

[94%,.01]

[99%,.01]

**Power**

**[% selected for inclusion, bias]**

**Selection strength**

**Strong****

**Moderate*****

**Range of variances**

**Large†**

**Small‡**

**Large†**

**Small‡**

*

**

†

Treatment effect (

.0

84%

36%

64%

25%

[36%,.34]

[36%,.74]

[57%,.25]

[57%,.54]

.5

80%

35%

53%

22%

[54%,.16]

[52%,.54]

[74%,.09]

[73%,.34]

1.0

70%

31%

38%

18%

[65%,.07]

[67%,.37]

[82%,.04]

[85%,.20]

1.5

58%

25%

27%

13%

[72%,.05]

[80%,.23]

[87%,.03]

[92%,.10]

2.0

47%

18%

22%

11%

[78%,.03]

[88%,.13]

[90%,.02]

[96%,.05]

2.5

37%

13%

17%

10%

[82%,.02]

[93%,.08]

[93%,.01]

[98%,.03]

3.0

30%

11%

14%

10%

[86%,.02]

[97%,.04]

[94%,.01]

[99%,.01]

Examples

We compare the new, adjusted test to that of Begg and Mazumdar, applying two examples from the literature

**Example 1** In the first meta-analysis, Cottingham and Hunter

**Example 2** The second example

Discussion

Although the test introduced by Begg and Mazumdar is well known and often cited in published work (Figure

When outcomes are binary, however, there are tests particularly designed for handling the additional difficulties that arise in this situation

Overall, we advocate the use of the adjusted method based on Spearman’s rho; it makes it easier to control the Type I error rate for small values of

Although the improved tests are more powerful than the test by Begg and Mazumdar, their general power is still limited, particularly for moderate amounts of bias and when the total number of studies included in the meta-analysis is typical of standard practice in medical applications. Tests for small-study effects should routinely be performed prior to conducting a meta-analysis. Nevertheless, it is important not to rule out the possibility of small-study effects when the tests do not produce significant results. Even when evidence of small-study effects is found in the meta-analysis, careful consideration should be given to possible explanations, e.g., publication bias and heterogeneity.

There are some limitations to our study that need to be addressed. We have only regarded the fixed effects model, i.e. no heterogeneity is assumed between the component studies in the meta-analysis. This is consistent with the original Begg and Mazumdar formulation

In addition, a more realistic distribution of the component studies in the meta-analyses should be used in new simulations. Furthermore, the selection function considered in our paper is very simplistic and depends only on the

The adjusted test corrects the significance level. It therefore forms a better basis for comparing the different test statistics introduced in the literature, e.g., those presented by Egger et al.

The methods considered in this paper test for publication bias in meta-analysis. Several authors address the issue of how to proceed if a test for publication bias is significant. Duval and Tweedie

Conclusion

We showed in simulations that the significance level of the rank correlation test introduced by Begg and Mazumdar often deviates considerably from the nominal level. Additionally, we proved that the assumptions for using a rank correlation test are not met. A modified rank correlation test which is based on the simulated distribution of the estimated measure of association, preferably Spearman’s rho, conditional on sampling variances, improves the error rates. This should thus be chosen over the conventional Begg and Mazumdar test in the case of normally distributed outcomes when testing for publication bias assuming fixed effects meta-analysis.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MG carried out the simulations, planned and wrote the manuscript. IH planned the paper, guided and provided critical input during the entire process. Both authors reviewed and revised the manuscript and have approved it for submission.

Acknowledgements

No external funding was received for this study. The authors are funded by University of Bergen.

Pre-publication history

The pre-publication history for this paper can be accessed here: