Page 21 - AIH-2-4
P. 21

Artificial Intelligence in Health                                                AI editorial policy ethics



            ethical scrutiny. 13,14,19-21  This episode exemplifies a recurring   explain their prioritization or integration within the model.
            problem: clinical journals often lack the necessary expertise,   Feature interactions in speech emotion recognition are
            infrastructure, or review frameworks to rigorously evaluate   complex, non-linear, and context-dependent. I highlighted
            the technical complexities and ethical dimensions of AI and   that:
            ML in mental health research. Without such mechanisms,      These technical issues have direct implications for
            flawed models with serious real-world consequences may   suicide risk classification and cannot be dismissed as
            be accepted uncritically. 13,14,19-21                 merely academic. 13-15

            4. Case study 2: Methodological oversights           Advanced feature selection techniques, such as SHAP-
            of Ding et al. (2025)                              based approaches or quantum-behaved particle swarm
                                                               optimization (QPSO),  have  shown  promise  in  refining
            The systemic editorial failures seen in the Haghish case were   discriminative feature sets to improve performance and
            not isolated. A study by Ding et al.,  titled “Speech-Based   interpretability, suggesting avenues for methodological
                                        12
            Suicide Risk Recognition for Crisis Intervention Hotlines     17,23,25
            Using  Explainable  Multi-task  Learning,”  innovatively   improvement.
            applies multi-task learning (MTL) and explainable AI   Regarding model architecture, while Bidirectional Long
            (XAI) to speech-based suicide risk detection in crisis   Short-Term Memory (Bi-LSTMs) capture some temporal
            hotline calls. Although innovative, several methodological   dependencies, they have inherent limitations in modeling
            choices  warrant  further  scrutiny,  particularly  regarding   long-range context. 24,26  Transformer-based architectures
            speech pre-processing, feature extraction, model   outperform Bi-LSTMs by leveraging multi-head attention
            architecture, and multimodal integration. 28       and enabling more interpretable focus on critical speech
              A major concern is the removal of silences longer   segments. 24,26  I emphasized:
            than 1 s from speech segments. Silences carry important      “The limitations of Bi-LSTM architectures in capturing
            emotional weight in high-stress contexts, indicating   long-range  emotional  dependencies”  diminish  the
            hesitation or distress, and their exclusion could lead to loss   model’s ability to detect subtle emotional variations over
            of critical psychological signals.  As stated in my letter:  extended speech segments.
                                     27
               Silence in speech, particularly during high-stress crisis   Recent  pre-trained  transformer  speech  models  (e.g.,
               calls, can carry emotional weight; its removal may   HuBERT) further demonstrate robustness and efficiency
               obscure indicators of hesitation, distress, or emotional   in real-world noisy environments, making them preferable
               regulation. 25-27                               for this application. 28
              Excluding these silences risks discarding valuable   Finally, the study’s exclusive reliance on speech
            psychological signals that are integral to accurately   features overlooks the benefits of multimodal integration.
            assessing caller emotional state. 25,27  Transformer models   Combining speech with textual transcriptions or
            with self-attention mechanisms (e.g., Wav2Vec 2.0) are   physiological data has been shown to improve emotion
            better suited to capture such long-range dependencies   detection accuracy and model robustness, especially
            without omitting silent intervals. 24,28           in complex, high-stakes environments, such as crisis

              The authors also utilized a fixed 5-s segmentation   hotlines. 26,29,30  Despite the substantive and clinically
            window for feature extraction, which may be too rigid to   relevant nature of these critiques, the editorial board
            capture the inherently non-linear and rapidly fluctuating   rejected the letter as “overly technical” and “lacking clinical
            emotional content in crisis speech. 28,29  I argued:  relevance.”  This dismissal highlights  a systemic  editorial
                                                               issue wherein rigorous methodological critique of AI
               The use of fixed 5-s segmentation windows may prevent   models is marginalized, risking the publication of flawed
               the model from capturing the dynamic fluctuations   models with potential real-world harms. 19-21,30,31
               typical in crisis speech patterns.
              More flexible temporal modeling approaches, such   5. Editorial gatekeeping and conflicts of
            as variable-length sequences handled by transformers   interest
            with multi-head attention, or techniques, such as sliding   The methodological shortcomings outlined in both case
            windows and dynamic time warping, could better capture   studies are not anomalous oversights but symptomatic of
            these rapid emotional transitions. 24-26           deeper structural failures in editorial practices governing
              Regarding feature extraction, the research team   AI in mental health research. A recurring theme in both
            extracted 178 paralinguistic features but did not clearly   rejections  was that  the  letters  were  “overly  technical”  or


            Volume 2 Issue 4 (2025)                         15                          doi: 10.36922/AIH025210049
   16   17   18   19   20   21   22   23   24   25   26