Page 134 - AIH-1-2
P. 134
Artificial Intelligence in Health SDoH in clinical narratives
Figure 7. Adjusted odds ratios for the probability of mentioning spiritual beliefs based on clinical case type, journal specialty, journal’s geographic region
and author’s geographic region. The figure was plotted with Matplotlib.
creating oversimplified narratives. Furthermore, these spectrum of clinical situations or health-care settings. This
biases risk being duplicated in training large language could lead to a skewed representation of certain regions,
models, especially those using self-supervised methods affecting our understanding of cultural influences on
with medical literature as data. SDoH mentions.
4.3. Technological opportunities Second, our analysis might understate SDoH mentions
due to two main reasons: our focus was limited to
Despite the low prevalence of SDoH mentions in clinical abstracts, specifically sentences outlining primary patient
case reports, using NER models through Spark NLP offer characteristics; and the NER model used had a potential
a potential path for broad-scale clinical record analysis for false negatives, evidenced by the recalls not being 100%.
on SDoH mentions. Notably, this method can be used Given the low SDoH mentions in the PubMed corpus,
on standard computing hardware, providing access to fully evaluating the NER model’s recall was challenging.
25
advanced data analytics. Our research indicated that NER However, our external validation revealed satisfactory
models are more efficient than larger models (e.g., GPT), recall metrics, and we inferred that the false negatives
especially for specific tasks like clinical entity detection. were likely evenly spread across the model’s attribute,
This technology can be used not only for reviewing clinical subsequently preventing significant impacts on the results
case reports but also for analyzing EHRs in the search from our logistic regression analysis.
of SDoH, 37,38 thereby enhancing research scalability. In
addition, high-level computational analysis could be In our analysis, we observed that most of the odds
performed with regular laptops and central processing ratios (ORs) for the SDoH factors were negative. This
units (CPUs). Recent studies successfully designed finding suggested that specific SDoH mentions within the
NER models to extract SDoH from clinical narratives. literature were rare and, when present, were often linked
27
However, the primary objective of our research was not to particular characteristics such as diagnoses, specialties,
merely to validate these NER models but to analyze the and cultures. Consequently, this led to OR < 1 for most of
factors associated with the likelihood of mentioning the analyzed features. The substantial sample size of our
specific SDoH when describing a clinical case. study further amplified the ability of the model to detect
statistically significant effects, even for minor associations,
4.4. Limitations adhering to the stringent p-value threshold of P < 0.0001.
Our investigation had several limitations that warrant The prevalence of negative ORs could also be due to
consideration. First, our dataset only included published overadjustment. Overadjustment occurs when a model
clinical case reports, which might not reflect the full includes too many variables or inappropriate variables,
Volume 1 Issue 2 (2024) 128 doi: 10.36922/aih.2737

