Page 31 - JCTR-9-4
P. 31
Shuster | Journal of Clinical and Translational Research 2023; 9(4): 246-252 247
one of the most heavily peer-reviewed biostatistical papers ever, A1: The true primary effect sizes for each study are drawn
with review material from nine of nine sources (including three independently from a single large “urn” of primary effect
world-renowned meta-analysts) agreeing with his conclusions. sizes. This assumption tells us we are targeting the unweighted
However, in this article, in a non-technical way, you will be shown mean of all studies in the urn.
clearly that the mainstream methods rely on five incompatible A2: The true primary effect sizes in the urn follow a normal (bell-
assumptions that underlie their validity. This makes the evidence shaped) distribution whose unweighted mean is the target
basis of the mainstream no different than claiming an evidence parameter of interest.
basis for off-label usage of a drug. You will be shown that if the first A3: The individual study provides an unbiased estimate of its
assumption is true, the last two cannot be true. In one highly cited study-specific true primary effect size and has an approximate
example, we show that the mainstream-based claim of efficacy normal distribution about its true primary effect size.
for an invasive treatment has no scientific basis. In another, A4: Up to a strong approximation, the weights are “constants”
the mainstream methods failed to detect a highly significant rather than seriously random variables. In other words, if
outcome. Had these methods been available and used, the use of you repeat the total experiment under the same Assumptions
a cardiotoxic type II diabetes drug could have been discontinued A1-A3 and the same urn, this assumption presumes you
at least 3 years earlier than what actually occurred. We provide obtain identical weights up to a strong approximation. This
free access to well-documented user-friendly Excel templates to assumption is mandatory to use the formulas for the mean and
conduct rigorous analyses of the main research questions. This standard error in the mainstream methods but will be shown
paper places no blame on the well-intentioned researchers who to be false under Assumptions A1-A3. More on this below.
developed these mainstream methods. However, if meta-analysis A5: There is no association between-study weight and study true
is to remain at the apex of “Evidence Pyramids,” it is imperative effect size. For example, if big studies tend to have higher
that statistical practice should be changed. This paper is needed for (lower) effect sizes than smaller studies, the method will tend
two reasons. First, with the availability of user-friendly software to overestimate (underestimate), respectively, the overall
for the mainstream methods, a high proportion of these analyses effect size. This could lead to unacceptable bias.
is done without input from biostatisticians or epidemiologists. 3. How Mainstream Weighted Random Effects Methods
Second, changes to statistical practice will not happen overnight.
Readers should be concerned when they read papers using meta- Work
analysis in the biomedical literature. In short, this is about proven The “variance” for the estimates of effect size for each study
science, not opinions. consists of two components, the reasons why its individual study
2. Assumptions Underlying the Validity of Current estimate of effect size differs from the true mean of the effect sizes
Mainstream Methods in the urn: (a) Within-study variance, which is estimated under
Assumption A3 and (b) Between-study variance, which is the
When methodologists derive analytic procedures, they make variance of the true effect sizes in the urn, per Assumptions A1
working distributional assumptions that enable them to complete and A2. The first component (a) depends on the accuracy of the
their work. Every time the procedure is used in practice, these within-study estimate and varies from study to study. The second
should be fully disclosed. In a specific application, if any of the component (b) is the same for all studies. The overall estimate is the
assumptions are wrong, the evidentiary basis of the results is in weighted average of the individual study estimates with weights
jeopardy. Unfortunately, few analytic procedures have adequate inversely proportional to the study’s estimated variance, which is
diagnostic tests for their assumptions. In meta-analysis, there the sum of the estimated within-study variance and the estimated
has been little vetting of the robustness of procedures when their between-study variance. If all five assumptions were true, these
assumptions fail. weights would minimize the standard error of the estimate of the
“Weighting inversely proportional to the estimated variance overall effect size, over all choices of weights that sum to one (a
estimation” (aka the mainstream method) is by far the most requirement for unbiasedness). Note that all other things being
common method used in combining data from a complete set of equal, larger between-study variance pushes the weights closer
randomized clinical trials of a research question. These methods to equal weights and smaller between-study variance pushes the
were derived under five assumptions (A1-A5) below, which must weights closer to fixed effects.
be true up to strong approximations. These are rarely disclosed 4. Why Assumption A4 is False
in full, and current software does not provide adequate warnings.
Assumptions A1 and A3 are reasonable in most applications. What this assumption requires is that if we repeat the experiment
Assumption A2 is questionable (no adequately powered diagnostic under Assumptions A1-A3, the resulting weights will be the same
test exists for it). Unfortunately, even if Assumption A1 is true, it up to a strong approximation.
follows that Assumptions A4 and A5 are false. This leaves open Imagine a meta-analysis where we independently generate
the strong likelihood that past meta-analyses may have reached the data twice under Assumptions A1-A3. Clearly, the true study
unsupportable conclusions, possibly contributing to inappropriate effect sizes for these two repetitions are sure to differ. It follows
public health recommendations. that the diversity (sample variance) of these true effect sizes will
DOI: http://dx.doi.org/10.18053/jctres.09.202304.22-00019

