Page 105 - AIH-2-3
P. 105

Artificial Intelligence in Health                                  Bone suppression utility for chest diagnosis



            derive the final labels for the regression models described   fold. The training was conducted on an NVIDIA GeForce
            in subsection 2.2.3., we averaged the total scores from   RTX 4070 with a Windows 11 operating system, utilizing
            both radiologists and normalized this value to a floating-  Python 3.8.18 and PyTorch 2.2.1.
            point number between 0 and 1 by dividing by 18. These   Based on our initial experiments, which indicated
            normalized scores were used as the labels for the regression   that the Stochastic Gradient Descent (SGD) optimizer
            models. The mean and standard deviation (SD) of the   consistently outperformed the Adam optimizer, we adopted
            scores across 192 images were 0.380 and 0.260, respectively.   SGD with a learning rate of 0.001 and a momentum of
            Figure 2 provides an overview of the severity assessment   0.9 for all models. In addition, a learning rate scheduler
            process.                                           (StepLR) was applied to reduce the learning rate by a factor
            2.2.2. Data pre-processing                         of 0.1 every 5 epochs.
            First, the radiographs were cropped and resized to   2.3. Performance evaluation
            center on the lung area, following the process described   2.3.1. Relationship between truths and predictions
            in subsection 2.1.2. Next, the images were transformed
            into  bone-suppressed images  using  the  AI-based bone   We compared the performance of the severity assessment
            suppression model developed in subsection 2.1. Both the   models between the standard chest radiograph dataset
            standard  radiographs  and  the  bone-suppressed  images   and the bone-suppressed image dataset by computing the
            were independently transformed to a resolution of 512 ×   mean ± SD of the mean absolute errors (MAEs) and PCCs
            512 pixels with 8-bit contrast. Subsequently, each type of   across all folds and random seeds. The PCC quantifies the
                                                                                                51
            image was then randomly split into training and test data   linear relationship between two variables,  as expressed by
            in an approximately 80:20 ratio, ensuring that all images   the following:
                                                                             i
            from the same patient were grouped together in the same     n1  x   y
                                                                                      y
                                                                               x
            split. Five-fold cross-validation was applied to each dataset   r   i0  i     ,             (III)
                                                                                      i
                                                                          i
                                                                              2
                                                                                         y
            separately, and, to enhance robustness, this process was    n1  x    n1  y   2
                                                                            x
            repeated 3 times using different random seeds.            i0          i0
            2.2.3. Regression models and training settings       Where  r is the PCC;  x  and  y  denote the individual
                                                                                          i
                                                                                     i
                                                               sample points; and  x  and  y  are the means of x  and y ,
                                                                                                             i
                                                                                                       i
            We employed several CNN architectures from different   respectively.
            generations – DenseNet,  ResNet18, ResNet50,  and
                                 47
                                                     48
            RegNetY-120  – all pre-trained on ImageNet.  To adapt   These  metrics  were  calculated  using  the
                                                 50
                      49
            these models for the regression task, we modified their final   “mean_absolute_error” function from the Python “sklearn.
            fully connected layers to have a single output that predicts   metrics” library and the “pearsonr” function from the
            a continuous value, corresponding to the normalized   Python “scipy.stats” module.
            Brixia score. These models were trained using the MSE   Statistical significance tests were conducted using a
            loss function for up to 25 epochs in each cross-validation   two-tailed Student’s t-test to compare the average MAEs














            Figure 2. Flowchart of coronavirus disease 2019 severity assessment. The Brixia scoring system assigns an integer value from 0 to 3 to each of the six lung
            zones (A1 to F1). The total scores from two radiologists were averaged and then normalized to generate the final label scores.


            Volume 2 Issue 3 (2025)                         99                               doi: 10.36922/aih.5608
   100   101   102   103   104   105   106   107   108   109   110