Page 135 - AIH-1-3
P. 135

Artificial Intelligence in Health                                    ADRD caregiver experiences on Reddit



            classification method for texts.  In this study, we used   3. Results
                                     17
            BERTopic, a sentence-transformers model, for extracting
            embedded document. Compared to previous methods    3.1. BERTTopic modeling output
            such as Latent Dirichlet Allocation (LDA) modeling,    A total of 1151 comments were collected from 15 Reddit
                                                         18
            BERTopic incorporates the semantic context of words   posts from our search results.  Using BertTopic topic
                                                                                         19
            and further fine-grained the method by considering the   modeling and manual topic refinement, we categorized the
            varying word semantic distance distributions.  Similar to   comments into six topics and provide example comments
                                                 19
            the user interface of other topic models, it outputs topic   for each topic in Table 1. Topic 0 was identified as “sharing
            assignment for each comment, as well as the top words of   caregiver  stories,” topic 1 as “appreciation of online
            each topic. The top words help us interpret the topics of the   community,” topic 2 as “concerns of abuse of ADRD family
            comments, while topic assignment lets us see how popular   member,” topic 3 as “financial struggles of caregivers,” topic
            each topic is, and it can also be used in the subsequent   4 as “early symptoms of ADRD of family member,” and topic
            sentiment analysis.                                5 as “symptoms of ADRD.” As seen in Table 1, the topic
                                                               having the greatest proportion of discussions was topic 0
              Another difference between BERTopic and LDA      (n = 926), followed by topic 1 (n = 126), topic 2 (n = 33),
            modeling  is  that  BERTopic  determines  the  number  of   topic 3 (n = 31), topic 4 (n = 22), and topic 5 (n = 13).
            topics by the text, while LDA relies on a user-defined
            number  of  topics. 20,21   Using  BERTopic,  we  generated   3.2. VADER (sentiment analysis) results
            an intertopic distance map to determine the distance   We used VADER to analyze the sentiment of the comments
            (difference) between the topics. An intertopic distance map   under each topic. Figure 1 describes the average VADER
            represents each topic as a circle on Cartesian plane, whose   sentiment score of the retrieved posts’ texts for each topic. In
            coordinates represent semantic distance. If circles do not   Figure 1, the x-axis corresponds to the VADER compound
            overlap, it is considered that the topics are well separated.   score that ranges from −1 to 1, where x<−0.05 represents
            If not, the topic model will be refitted with an adequately   negative sentiment, −0.05<x<0.05 represents neutral
            smaller  topic  number,  and  the  intertopic  distance  map   sentiment, and x>0.05 represents positive sentiment. As
            will be plotted again to see if the topics are well separated.   described by the histogram bars in  Figure  1, topic 3 is
            The “step-size” of each refitting can vary depending on   skewed to the right indicating more positive sentiment,
            prior knowledge on the dataset. For example, in the case   while topics 1 and 3 are skewed to the left indicating
            where no more than 20 topics are expected in the text, and   more negative sentiment.  Figure  2 provides a direct
            BERTopic model identifies more than 100 topics, the “step   comparison of comment sentiment proportions. Topic 0
            size” can be 5 – 10 less topics for next refitting, until topic   had relatively equal proportions of positive and negative
            separation appears, or that number of topics is reduced to   sentiment, whereas topic 5 had the most proportion of
            20. After that, the “step size” can be 1 less topic for each   neutral sentiment and topic 3 had the highest proportion
            refitting.                                         of positive posts.

            2.3. Sentiment analysis                              The top words in each topic are displayed in Table 2.
            To understand the sentiment that a comment carries, we   Topic 0 was the largest topic of posts and manually
            performed sentiment analysis, which quantifies positive   labeled as “shared stories by caregivers.” This topic
            and negative sentiment. We adopted the most widely   included stories that ADRD caregivers shared with other
            used sentiment analysis, Valence Aware Dictionary for   ADRD caregiving users on Reddit. Comments included
            Sentiment Reasoning (VADER), for our purpose in this   personalized experiences of their family member having
            study.  VADER is a rule-based model that summarizes   ADRD symptoms, describing in detail specific cases.
                 20
            lexical, grammatical, and syntactical features of text and   Top keywords included specific family members, such as
            quantifies the tone of sentiment into scores.  Compound   “mom” and “dad.” As shown in Table 1 and Figure 1, 44.8%
                                               20
            VADER scores are normalized from the raw VADER     of the posts were negative and 53.2% of posts had a positive
            scores and span from −1 to 1, with a negative score   sentiment.
            representing negative sentiment, and vice versa. We   Topic 1 was manually labeled as “appreciation of online
            followed the rule of thumb in VADER sentiment analysis   community.” This  topic  included  comments  in  which
            and identified those with compound VADER scores    caregivers shared gratitude and thanks with other Reddit
            <−0.05 as negative comments, −0.05 to 0.05 as neutral,   users, showcasing the benefit of these online communities.
            and those with compound VADER scores >0.05 to be   The top five keywords in Topic 1 were “thank,” “sorry,”
            positive comments.                                 “much,” “go,” and “share.” As shown in Table 1 and Figure 1,



            Volume 1 Issue 3 (2024)                        129                               doi: 10.36922/aih.3075
   130   131   132   133   134   135   136   137   138   139   140