Page 124 - AIH-1-3
P. 124

Artificial Intelligence in Health                                 Interpretability of deep models for COVID-19



            of original audio, as described by Casanova  et al.  The   region (401 × 80). Age, F0-STD, and sex occupy 20 lines,
                                                      9
            voting summed the predicted probabilities for each class.   while age and sex use 133 columns and F0-STD uses
            Windowing also served as a simple data augmentation   135 columns. Age is represented by shades of gray, as it is
            technique, in addition to the approaches presented in   a scalar value, and F0-STD is similarly represented. Sex is
            Section 3.2.                                       a binary value, with zero for males and one for females. F0
                                                               is represented in a “bar code” style, where each value in the
            3.4. Dynamic preprocessing                         original vector is repeated across an entire column in the
            The audios were preprocessed for each training step,   generated matrix.
            ensuring a richer variety of augmented data. To maintain   Using the scheme presented in Figure 1, the first three
            our model consistent, the same preprocessing was applied   proposed experiments are:
            during the validation and test phases. The following   •   Experiment 1: Uses only the spectrogram
            operations were carried out:                          (401 × 80 pixels) as input
            (i)  Noise injection                               •   Experiment 2: Uses F0, F0-STD, age, and sex
            (ii)  Windowing                                       (401 × 40 pixels) as input
            (iii) Spectrogram extraction                       •   Experiment 3: Uses all input data present in Figure 1,
            (iv)  Spec-augment application (only for training)    including the spectrograms, F0, F0-STD, age, and sex
            (v)  Mix-up application (only for training)           (401 × 120 pixels).
            (vi) Training step/test step.
                                                                 All three experiments are based on the SpiraNet model
              Operations 4 and 5 were applied only to PANN-based   and use the configurations from Set 1 of Table 1. Moreover,
            experiments and only during training, while the other   the  general  hyperparameters for  all  the experiments
            operations were common to all experiments. For     (including  Experiments  4  and 5  in Section  3.6),  based
            operation 3, we used different parameters for spectrogram   on Casanova et al.,  are as follows: Binary cross-entropy
                                                                               9
            extraction in our experiments.  Table 1 presents the   loss and the Adam optimizer.  Given that the focus is
                                                                                        31
            two settings used across the experiments presented in   on studying the model’s decision process rather than
            Sections  3.5  and 3.6: Set 1 was used for SpiraNet and   performance, the batch size is set at one, early stopping
            matched the parameters from Casanova et al.  and Set 2   and a learning rate scheduler are not used, and the number
                                                 9
            was used for CNN4 and needed to be consistent with the   of epochs is set to 1000 for all experiments. Despite these
            parameters used in pre-training. Two parameters were   settings, CNN14 achieves accuracies close to the best
            common for all spectrogram-based experiments: The   models reported in the literature. We used a fixed learning
            number  of  fast  Fourier  fransform   components  (1200)   rate of 0.001 and a weight decay of 0.01.
                                        30
            and the spectrogram format (log-Mel).
                                                               3.6. Experiments over the training process
            3.5. Experiments to find the best inputs
                                                               We performed three additional experiments to analyze
            Here, we describe three experiments aiming to estimate   classification models with respect to potential changes
            the accuracy of the SpiraNet  with respect to three different   during training, pre-training, and post-processing. The
                                  9
            input configurations. These experiments investigated   three experiments are described as follows:
            the role of different information types (spectrogram, F0,   •   Experiment 4: The goal of this experiment is to
            F0-STD, age, and sex) in the model’s decision process.   determine how the accuracy of a classification model
            Spectrograms are matrices, while F0 is a vector, and the   changes when using large-scale pre-trained models.
            remaining data are scalars. We converted all these data   To achieve this, it focuses on pre-training, exploring
            into matrices to facilitate visual analysis using Grad-CAM,   the use of transfer learning through a PANN model
            described in the subsequent sections. The representation   (CNN14). This experiment was configured using Set 2
            is shown in  Figure  1. The input, in its full form, has   from Table 1.
            401 × 120 pixels, where the spectrogram occupies the top

            Table 1. Settings used in the experiments
            Set   Hop size   Number of   Number   Window length
                   (ms)    frequency   of Mel      (ms)
            1       160       601       80         400
            2       320       513       64         1,024       Figure 1.  Input representation. Notes: F0: Fundamental frequency;
            Abbreviation: ms: Milliseconds.                    F0-STD: Fundamental frequency standard deviation


            Volume 1 Issue 3 (2024)                        118                               doi: 10.36922/aih.2992
   119   120   121   122   123   124   125   126   127   128   129