Page 91 - AIH-2-3
P. 91

Artificial Intelligence in Health                                  Organizational culture’s impact on burnout



              To further  assess model performance,  we calculated   discriminatory  power.  The  model  correctly  identified
            cross-validation results, the area under the receiver   responses  of  5  for  the  question  regarding  employees’
            operating characteristic (AUC-ROC) curves, recall,   perceptions of their organization’s culture approximately
            precision, and F1 scores. The models were tuned using   60% of the time, as illustrated in Figure 3. Model 2 showed
            10-fold cross-validation, and performance was evaluated   weak  discriminatory  power,  though  its  AUC,  as  shown
            based  on  the  number  of  splits  in  each  model.  Model   in  Figure  4,  was slightly higher than that of Model 1,
            1’s cross-validation results showed 58% accuracy and a   indicating somewhat better classification performance.
            Cohen’s kappa of 34%, with two features randomly selected   The ROC curve, positioned above the diagonal line, reflects
            at each split in the decision tree. As the number of splits   performance better than random, but it was still not close
            increased, performance worsened. At six features, accuracy   to the top-left corner, which would indicate high sensitivity
            and Cohen’s kappa decreased to 55% and 32%, respectively.   and correct identification of positive perceptions of OC.
            At 11 features, the accuracy dropped further to 52%, with   The curve’s position farther to the right suggests a higher
            a Cohen’s kappa of 25%, as shown in Table 1. Notably, with   false-positive rate, indicating that more scores that are not
            two randomly selected features, the standard deviation   5 are classified as 5. In addition, the recall of 0.6 for scores
            of accuracy was small (19%), suggesting consistent   of 5 demonstrates that the model predicts 60% of true
            performance across folds. The standard deviation of   positives of score 5 correctly, while the precision of 0.43
            Cohen’s kappa was also low (0.33%), indicating consistent   means that 60% of the instances classified as positive are
            agreement between observed and predicted scores. Despite   truly positive. The F1 score for Model 2’s ability to correctly
            the high accuracy at two randomly selected features, the   predict scores of 5 is 0.5, indicating somewhat adequate
            low Cohen’s kappa suggests only fair agreement between   performance with room for improvement.
            observed and predicted scores.
              Similar to Model 1, Model 2 showed the best accuracy
            and Cohen’s kappa with two randomly selected features.
            At two features, the model achieved 47% accuracy and a
            kappa of 0.19. With four features, accuracy decreased to
            40%, with a kappa of 11%. At 11 features, the model showed
            45% accuracy and a kappa of 17%. However, the standard
            deviations for both accuracy and kappa across all feature
            selections (2, 4, and 11) were low, indicating consistent
            model performance across folds and across observed
            and predicted values. Despite these findings, Model 1
            outperformed Model 2 in terms of accuracy and kappa.
              Model 1’s AUC-ROC curve revealed an AUC of       Figure 2. Area under (AUC) the receiver operating characteristic (ROC)
            0.57, indicating some predictive power, with the model   curve for Model 1
            correctly identifying positive and negative cases of score
            5 approximately 57% of the time. However, the model
            demonstrated limited discriminatory power, as the AUC
            is relatively low, suggesting poor differentiation between
            classes. Given that the dataset involved a Likert scale, with
            many participants selecting 5 for question C30, the dataset
            may be imbalanced, which could explain the low AUC.
            The ROC curve’s position above the diagonal line indicates
            performance slightly better than random, though the curve
            was not close to the top-left corner of the graph, which
            would indicate high sensitivity. The recall, precision, and
            F1 score for predicting scores of 5 were 0.5, 0.38, and 0.43,
            respectively, reflecting somewhat adequate performance
            but also indicating room for improvement in correctly
            identifying true positives and true negatives.
              As shown in  Figure  2, Model 2’s AUC-ROC curve   Figure 3. Multidimensional scaling plot for Model 2
            revealed an AUC of 0.6, suggesting relatively weak   Abbreviation: Dim: Dimension.


            Volume 2 Issue 3 (2025)                         85                               doi: 10.36922/aih.5127
   86   87   88   89   90   91   92   93   94   95   96