Page 31 - JCTR-9-4
P. 31

Shuster | Journal of Clinical and Translational Research 2023; 9(4): 246-252    247
        one of the most heavily peer-reviewed biostatistical papers ever,   A1:  The  true  primary  effect  sizes for each  study are  drawn
        with review material from nine of nine sources (including three   independently  from a  single  large  “urn”  of primary  effect
        world-renowned meta-analysts) agreeing with his conclusions.   sizes. This assumption tells us we are targeting the unweighted
        However, in this article, in a non-technical way, you will be shown   mean of all studies in the urn.
        clearly  that  the  mainstream  methods  rely  on  five  incompatible   A2: The true primary effect sizes in the urn follow a normal (bell-
        assumptions that underlie their validity. This makes the evidence   shaped) distribution whose  unweighted mean is the target
        basis of the mainstream no different than claiming an evidence   parameter of interest.
        basis for off-label usage of a drug. You will be shown that if the first   A3:  The  individual  study  provides  an  unbiased  estimate  of  its
        assumption is true, the last two cannot be true. In one highly cited   study-specific true primary effect size and has an approximate
        example, we show that the mainstream-based claim of efficacy   normal distribution about its true primary effect size.
        for  an  invasive  treatment  has  no  scientific  basis.  In  another,   A4: Up to a strong approximation, the weights are “constants”
        the  mainstream  methods  failed  to  detect  a  highly  significant   rather  than  seriously  random  variables.  In other  words, if
        outcome. Had these methods been available and used, the use of   you repeat the total experiment under the same Assumptions
        a cardiotoxic type II diabetes drug could have been discontinued   A1-A3  and  the  same  urn,  this  assumption  presumes  you
        at least 3 years earlier than what actually occurred. We provide   obtain identical weights up to a strong approximation. This
        free access to well-documented user-friendly Excel templates to   assumption is mandatory to use the formulas for the mean and
        conduct rigorous analyses of the main research questions. This   standard error in the mainstream methods but will be shown
        paper places no blame on the well-intentioned researchers who   to be false under Assumptions A1-A3. More on this below.
        developed these mainstream methods. However, if meta-analysis   A5: There is no association between-study weight and study true
        is to remain at the apex of “Evidence Pyramids,” it is imperative   effect size. For example, if big studies tend to have higher
        that statistical practice should be changed. This paper is needed for   (lower) effect sizes than smaller studies, the method will tend
        two reasons. First, with the availability of user-friendly software   to overestimate (underestimate), respectively, the  overall
        for the mainstream methods, a high proportion of these analyses   effect size. This could lead to unacceptable bias.
        is done without input from biostatisticians  or epidemiologists.   3. How Mainstream Weighted Random Effects Methods
        Second, changes to statistical practice will not happen overnight.
        Readers should be concerned when they read papers using meta-  Work
        analysis in the biomedical literature. In short, this is about proven   The “variance” for the estimates of effect size for each study
        science, not opinions.                                  consists of two components, the reasons why its individual study
        2. Assumptions Underlying the Validity of Current       estimate of effect size differs from the true mean of the effect sizes
        Mainstream Methods                                      in the urn: (a) Within-study variance, which is estimated under
                                                                Assumption A3  and  (b)  Between-study  variance,  which  is  the
          When methodologists derive analytic procedures, they make   variance of the true effect sizes in the urn, per Assumptions A1
        working distributional assumptions that enable them to complete   and A2. The first component (a) depends on the accuracy of the
        their work. Every time the procedure is used in practice, these   within-study estimate and varies from study to study. The second
        should be fully disclosed. In a specific application, if any of the   component (b) is the same for all studies. The overall estimate is the
        assumptions are wrong, the evidentiary basis of the results is in   weighted average of the individual study estimates with weights
        jeopardy. Unfortunately, few analytic procedures have adequate   inversely proportional to the study’s estimated variance, which is
        diagnostic  tests  for their  assumptions.  In meta-analysis,  there   the sum of the estimated within-study variance and the estimated
        has been little vetting of the robustness of procedures when their   between-study variance. If all five assumptions were true, these
        assumptions fail.                                       weights would minimize the standard error of the estimate of the
          “Weighting  inversely  proportional  to  the  estimated  variance   overall effect size, over all choices of weights that sum to one (a
        estimation”  (aka the mainstream  method)  is by far the most   requirement for unbiasedness). Note that all other things being
        common method used in combining data from a complete set of   equal,  larger  between-study  variance  pushes the  weights  closer
        randomized clinical trials of a research question. These methods   to equal weights and smaller between-study variance pushes the
        were derived under five assumptions (A1-A5) below, which must   weights closer to fixed effects.
        be true up to strong approximations. These are rarely disclosed   4. Why Assumption A4 is False
        in full, and current software does not provide adequate warnings.
        Assumptions  A1  and  A3  are  reasonable  in  most  applications.   What this assumption requires is that if we repeat the experiment
        Assumption A2 is questionable (no adequately powered diagnostic   under Assumptions A1-A3, the resulting weights will be the same
        test exists for it). Unfortunately, even if Assumption A1 is true, it   up to a strong approximation.
        follows that Assumptions A4 and A5 are false. This leaves open   Imagine a meta-analysis  where we independently  generate
        the strong likelihood that past meta-analyses may have reached   the data twice under Assumptions A1-A3. Clearly, the true study
        unsupportable conclusions, possibly contributing to inappropriate   effect sizes for these two repetitions are sure to differ. It follows
        public health recommendations.                          that the diversity (sample variance) of these true effect sizes will
                                           DOI: http://dx.doi.org/10.18053/jctres.09.202304.22-00019
   26   27   28   29   30   31   32   33   34   35   36