Page 126 - AIH-1-2
P. 126
Artificial Intelligence in Health SDoH in clinical narratives
To analyze the link between article features and SDoH 3.3. Prevalence of social determinants of health
mentions, we conducted six logistic regression analyses mentions
using the Python package statsmodels 0.14.0 to gauge the Among the total case reports examined, 20,420 (4.4%) case
adjusted odds ratio (AOR) for each article trait. We also reports included references to at least one SDoH category.
employed a stepwise additive method, where features A breakdown revealed that 17,765 case reports specifically
35
that could enhance the likelihood of the model were mentioned race/ethnicity, followed by 1,991 articles that
sequentially incorporated with a P-value threshold of discussed marital status, 524 on sexual orientation, 284
0.001 for the likelihood ratio test.
on immigrant status, 63 on spiritual beliefs, and 60 on
3. Results homelessness. The mean and confidence intervals of the
mentioned rates within the study period are summarized
3.1. Study population and data inclusion in Table 2.
We analyzed a comprehensive dataset comprising 463,546 The analysis of the proportion of clinical cases
clinical case reports indexed in Medline from 1975 reporting SDoH within the study period indicated a
through 2022. The distribution of the articles based on statistically significant association between publication
four key information (author’s geographic region, journal’s year and race/ethnicity (P < 0.001), sexual orientation
geographic region, journal specialty, and clinical diagnosis) (P < 0.001), and homelessness (P < 0.001), respectively.
is displayed in Table 1. Notably, there was a peak of sexual orientation mentions
3.2. Recall and precision of identifying mentions of from 1980 to 1995, and we hypothesized that this could
the social determinants of health be related to the rise of acquired immunodeficiency
syndrome (AIDS) cases, as depicted in Figure S3. There
In our corpus analysis, the SDoH identification precisions was also a prominent increase in race/ethnicity mentions
were 99.3% (95% confidence interval [CI]: 99.2 – 99.4%) between 2011 and 2013 (Figure S4) and a less evident but
for race/ethnicity, 90.2% (95% CI: 88.8 – 91.4%) for marital statistically significant increase in homelessness mentions
status, 90.8% (95% CI: 86.9–93.6%) for population group, since 1990.
97.4% (95% CI: 95.6 – 98.4%) for sexual orientation, 100%
(95% CI: 94.6 – 100%) for housing, and 98.4% (95% CI: 3.4. Factors associated with reporting social
91.7 – 99.7%) for spiritual beliefs. determinants of health
During external validation, the precision results were 3.4.1. Race/ethnicity
97.4% (95% CI: 86.5 – 99.5%) for race/ethnicity, 100% Significant associations were observed between the author’s
(95% CI: 92.3 – 100%) for marital status, 88.9% (95% CI: geographic origins and the frequency of race/ethnicity
56.5 – 98.0%) for population group, 93.8% (95% CI: 71.7 mentions. Authors from sub-Saharan Africa were most
– 98.9%) for sexual orientation, 98.6% (95% CI: 92.3 – likely to discuss race/ethnicity (AOR: 4.47; 95% CI: 3.96 –
99.7%) for housing, and 83.0% (95% CI: 70.8 – 90.8%) for 5.04), followed by the Caribbean (AOR: 3.31; 95% CI: 2.24
spiritual beliefs. – 4.89), Southeast Asia (AOR: 2.89; 95% CI: 2.58 – 3.25),
The recalls in the external validation were 90.2% (95% East Asia (AOR: 2.00; 95% CI: 1.90 – 2.09), and North
CI: 77.5 – 96.1%) for race/ethnicity, 97.9% (95% CI: 88.9 America (AOR: 1.77; 95% CI: 1.68 – 1.86). Conversely,
– 99.6%) for marital status, 88.9% (95% CI: 56.5 – 98.0%) authors from the Indian subcontinent (AOR: 0.69; 95% CI:
for population group, 100% (95% CI: 79.6 – 100%) for 0.62 – 0.76) and Middle East (AOR: 0.77; 95% CI: 0.70 –
sexual orientation, 85.2% (95% CI: 75.9 – 91.37%) for 0.84) were less inclined to mention race/ethnicity in their
housing, and 83.0% (95% CI: 70.8 – 90.8%) for spiritual case reports.
beliefs. The journal’s geographic region also exerted an
In our analysis comparing the recall and precision independent influence on race/ethnicity mentions.
of the JSL SDoH-NER model with those of zero-shot Journals originating from Australia-Oceania (AOR: 1.34;
learning (i.e., GPT-3.5 and GPT-4), both JSL and GPT-4 95% CI: 1.17 – 1.53) and Western Europe (AOR: 1.30; 95%
displayed comparable results. Notably, some differences CI: 1.18 – 1.43) were slightly more prone to include race/
were evident: JSL outperformed GPT-4 in precision for ethnicity. In contrast, journals from East Asia (AOR: 0.48;
marital status (p = 0.005; GPT-4 scored 82.9%; 95% CI: 95% CI: 0.43 – 0.54), Eastern Europe (AOR: 0.54; 95%
67.3–91.9%) and housing (p < 0.001; GPT-4 scored 82.9%; CI: 0.45 – 0.64), and South America (AOR: 0.55; 95% CI:
95% CI: 67.3–91.9%). The results of this comparison are 0.43 – 0.69) had much fewer race/ethnicity mentions than
detailed in Figures S1 and S2. expected.
Volume 1 Issue 2 (2024) 120 doi: 10.36922/aih.2737

