Page 21 - AIH-2-4
P. 21
Artificial Intelligence in Health AI editorial policy ethics
ethical scrutiny. 13,14,19-21 This episode exemplifies a recurring explain their prioritization or integration within the model.
problem: clinical journals often lack the necessary expertise, Feature interactions in speech emotion recognition are
infrastructure, or review frameworks to rigorously evaluate complex, non-linear, and context-dependent. I highlighted
the technical complexities and ethical dimensions of AI and that:
ML in mental health research. Without such mechanisms, These technical issues have direct implications for
flawed models with serious real-world consequences may suicide risk classification and cannot be dismissed as
be accepted uncritically. 13,14,19-21 merely academic. 13-15
4. Case study 2: Methodological oversights Advanced feature selection techniques, such as SHAP-
of Ding et al. (2025) based approaches or quantum-behaved particle swarm
optimization (QPSO), have shown promise in refining
The systemic editorial failures seen in the Haghish case were discriminative feature sets to improve performance and
not isolated. A study by Ding et al., titled “Speech-Based interpretability, suggesting avenues for methodological
12
Suicide Risk Recognition for Crisis Intervention Hotlines 17,23,25
Using Explainable Multi-task Learning,” innovatively improvement.
applies multi-task learning (MTL) and explainable AI Regarding model architecture, while Bidirectional Long
(XAI) to speech-based suicide risk detection in crisis Short-Term Memory (Bi-LSTMs) capture some temporal
hotline calls. Although innovative, several methodological dependencies, they have inherent limitations in modeling
choices warrant further scrutiny, particularly regarding long-range context. 24,26 Transformer-based architectures
speech pre-processing, feature extraction, model outperform Bi-LSTMs by leveraging multi-head attention
architecture, and multimodal integration. 28 and enabling more interpretable focus on critical speech
A major concern is the removal of silences longer segments. 24,26 I emphasized:
than 1 s from speech segments. Silences carry important “The limitations of Bi-LSTM architectures in capturing
emotional weight in high-stress contexts, indicating long-range emotional dependencies” diminish the
hesitation or distress, and their exclusion could lead to loss model’s ability to detect subtle emotional variations over
of critical psychological signals. As stated in my letter: extended speech segments.
27
Silence in speech, particularly during high-stress crisis Recent pre-trained transformer speech models (e.g.,
calls, can carry emotional weight; its removal may HuBERT) further demonstrate robustness and efficiency
obscure indicators of hesitation, distress, or emotional in real-world noisy environments, making them preferable
regulation. 25-27 for this application. 28
Excluding these silences risks discarding valuable Finally, the study’s exclusive reliance on speech
psychological signals that are integral to accurately features overlooks the benefits of multimodal integration.
assessing caller emotional state. 25,27 Transformer models Combining speech with textual transcriptions or
with self-attention mechanisms (e.g., Wav2Vec 2.0) are physiological data has been shown to improve emotion
better suited to capture such long-range dependencies detection accuracy and model robustness, especially
without omitting silent intervals. 24,28 in complex, high-stakes environments, such as crisis
The authors also utilized a fixed 5-s segmentation hotlines. 26,29,30 Despite the substantive and clinically
window for feature extraction, which may be too rigid to relevant nature of these critiques, the editorial board
capture the inherently non-linear and rapidly fluctuating rejected the letter as “overly technical” and “lacking clinical
emotional content in crisis speech. 28,29 I argued: relevance.” This dismissal highlights a systemic editorial
issue wherein rigorous methodological critique of AI
The use of fixed 5-s segmentation windows may prevent models is marginalized, risking the publication of flawed
the model from capturing the dynamic fluctuations models with potential real-world harms. 19-21,30,31
typical in crisis speech patterns.
More flexible temporal modeling approaches, such 5. Editorial gatekeeping and conflicts of
as variable-length sequences handled by transformers interest
with multi-head attention, or techniques, such as sliding The methodological shortcomings outlined in both case
windows and dynamic time warping, could better capture studies are not anomalous oversights but symptomatic of
these rapid emotional transitions. 24-26 deeper structural failures in editorial practices governing
Regarding feature extraction, the research team AI in mental health research. A recurring theme in both
extracted 178 paralinguistic features but did not clearly rejections was that the letters were “overly technical” or
Volume 2 Issue 4 (2025) 15 doi: 10.36922/AIH025210049

