Page 134 - AIH-1-2
P. 134

Artificial Intelligence in Health                                              SDoH in clinical narratives
































            Figure 7. Adjusted odds ratios for the probability of mentioning spiritual beliefs based on clinical case type, journal specialty, journal’s geographic region
            and author’s geographic region. The figure was plotted with Matplotlib.

            creating oversimplified narratives. Furthermore, these   spectrum of clinical situations or health-care settings. This
            biases risk being duplicated in training large language   could lead to a skewed representation of certain regions,
            models, especially those using self-supervised methods   affecting our  understanding  of  cultural  influences  on
            with medical literature as data.                   SDoH mentions.
            4.3. Technological opportunities                     Second, our analysis might understate SDoH mentions
                                                               due to two main reasons: our focus was limited to
            Despite the low prevalence of SDoH mentions in clinical   abstracts, specifically sentences outlining primary patient
            case reports, using NER models through Spark NLP offer   characteristics; and the NER model used had a potential
            a potential path for broad-scale clinical record analysis   for false negatives, evidenced by the recalls not being 100%.
            on SDoH mentions. Notably, this method can be used   Given the low SDoH mentions in the PubMed corpus,
            on standard computing hardware,  providing access to   fully evaluating the NER model’s recall was challenging.
                                        25
            advanced data analytics. Our research indicated that NER   However, our external validation revealed satisfactory
            models are more efficient than larger models (e.g., GPT),   recall metrics, and we inferred that the false negatives
            especially for specific tasks like clinical entity detection.   were likely evenly spread across the model’s attribute,
            This technology can be used not only for reviewing clinical   subsequently preventing significant impacts on the results
            case reports but also for analyzing EHRs in the search   from our logistic regression analysis.
            of SDoH, 37,38  thereby enhancing research scalability. In
            addition,  high-level  computational  analysis  could  be   In our analysis, we observed that most of the odds
            performed with regular laptops  and central  processing   ratios  (ORs)  for  the  SDoH  factors  were  negative.  This
            units (CPUs). Recent studies successfully designed   finding suggested that specific SDoH mentions within the
            NER models to extract SDoH from clinical narratives.    literature were rare and, when present, were often linked
                                                         27
            However, the primary objective of our research was not   to particular characteristics such as diagnoses, specialties,
            merely to validate these NER models but to analyze the   and cultures. Consequently, this led to OR < 1 for most of
            factors associated with the likelihood of mentioning   the analyzed features. The substantial sample size of our
            specific SDoH when describing a clinical case.     study further amplified the ability of the model to detect
                                                               statistically significant effects, even for minor associations,
            4.4. Limitations                                   adhering to the stringent p-value threshold of P < 0.0001.
            Our investigation had several limitations that warrant   The prevalence of negative ORs could also be due to
            consideration. First, our dataset only included published   overadjustment. Overadjustment occurs when a model
            clinical case reports, which might not reflect the full   includes too many variables or inappropriate variables,


            Volume 1 Issue 2 (2024)                        128                               doi: 10.36922/aih.2737
   129   130   131   132   133   134   135   136   137   138   139