Page 20 - AIH-2-4
P. 20

Artificial Intelligence in Health                                                AI editorial policy ethics



            suicidal from non-suicidal self-harm, the other critiquing a   representation artificially, leading to overfitting on
            speech-based suicide risk detection system – were rejected   synthetic  samples  that  do  not  adequately  represent  real-
            not for inaccuracies in my evaluation, but for being “overly   world variation. 18,23  This undermines model robustness and
            technical” or “lacking clinical relevance.” 11-31  In one case,   compromises generalizability across unseen populations
            editorial processes allowed the original authors to pre-clear   and clinical settings. In my correspondence, I wrote:
            critiques, undermining the independence of peer review      Class imbalance remains one of the most significant
            and suppressing substantive methodological discussion. 12  challenges in supervised machine learning, particularly
              These cases are not outliers. They reflect a deeper,   in domains, such as adolescent self-harm, where suicide
            systemic issue in how interdisciplinary research is handled   attempts represent a small portion of the dataset. The
            in clinical publishing. Through these case studies, this   synthetic oversampling techniques employed, while
            perspective contributes to the ongoing discourse on peer   well-intentioned, may risk overfitting and undermine
            review integrity by identifying structural editorial failures,   generalizability.
            analyzing  their  ethical  and scientific  implications, and   The clinical adoption of AI models hinges on
            proposing reforms to align publishing practices with the   transparent decision-making processes that clinicians can
            technical demands of AI-integrated mental health research.
                                                               understand and trust. The original study lacked sufficient
            2. The challenge of evaluating AI in clinical      interpretability measures to explain how the model
            publishing                                         attributed importance to various features. I  proposed
                                                               integrating SHAP (SHapley Additive exPlanations)
            While transformative, AI and ML methods are not    values to provide fine-grained, interpretable insights into
            immune to significant flaws. 10,13,14  Unlike conventional   feature contributions. SHAP values allow clinicians to see
            clinical  research  methods  (e.g.,  randomized  controlled   which factors most influenced the model’s predictions in
            trials, cohort studies, case-control studies, cross-  individual cases, facilitating informed clinical judgment
            sectional studies, case  reports,  and systematic  reviews),   and improving acceptance in high-stakes settings. 16,17
            AI-driven studies and studies using AI methods demand   Specifically, I noted:
            a nuanced understanding of data science principles,      Integrating SHAP values could enhance the transparency
            algorithmic transparency, model  generalizability, and   of the model’s feature attribution, making the system
            ethical implications.  Peer reviewers and editors in clinical   more interpretable to clinicians and better suited for
                            19
            journals, who may not be versed in the complexities of   high-stakes environments.
            computational models, can unintentionally overlook or
            misinterpret issues that would be immediately evident to   Adolescents’ behavioral and clinical profiles vary
            AI specialists. 13                                 widely across different populations and healthcare
                                                               contexts. The study’s model was trained on a relatively
            3. Case study 1: Methodological limitations        homogeneous sample, limiting its applicability elsewhere.
            of Haghish (2025)                                  I  suggested  employing  transfer  learning  techniques,
                                                               which allow models to leverage knowledge from related
            This challenge was starkly evident when I submitted some
            correspondence to a high-impact psychiatry journal   datasets or tasks to improve performance on new, diverse
                                                                     24,26
            regarding a 2025 study by Haghish, titled “Differentiating   cohorts.   Transfer learning offers a path to improve
            Adolescent Suicidal and Nonsuicidal Self-Harm with   model adaptability and external validity, a key requirement
            Artificial Intelligence.”  My critique focused on several   for any AI tool intended for broad clinical use:
                              11
            key methodological concerns, including class imbalance,      Transfer learning could offer a viable path to improve
            model interpretability, and generalizability, all essential to   generalizability, particularly across diverse clinical
            validate that AI models are both scientifically sound and   settings or populations not represented in the original
            clinically applicable. 15-29                          training data.

              Class imbalance is a pervasive problem in supervised   The letter was ultimately rejected, with editorial
            ML, especially in sensitive domains, such as adolescent   feedback stating that these methodological concerns were
            self-harm, where suicidal attempts constitute a small   “outside the journal’s thematic scope.” 13,14  While editorial
            minority of the dataset. 18,22,23,27,28  While Haghish employed   discretion is understandable, this dismissal raises deeper
            synthetic oversampling techniques  (specifically the   issues about how clinical journals vet AI-driven research.
                                          18
            synthetic minority oversampling technique, SMOTE),   By sidelining fundamental  questions  about model rigor
            these methods – although well-intentioned – carry   and applicability, the editorial board risks perpetuating the
            inherent risks. Oversampling can inflate minority class   publication of AI studies that lack sufficient scientific and


            Volume 2 Issue 4 (2025)                         14                          doi: 10.36922/AIH025210049
   15   16   17   18   19   20   21   22   23   24   25