Page 129 - AIH-1-3
P. 129

Artificial Intelligence in Health                                 Interpretability of deep models for COVID-19



              Regarding the training process, we found that      Finally, our best model (CNN14) achieved an accuracy
            noise insertion is important, consistent with previous   of 94.44%. This number is almost as good as the best
            findings;  therefore, we used it in all experiments. Other   models reported in the literature  and shows that proper
                   20
                                                                                         12
            augmentations, such as Mix-up and SpecAugment, did   use of transfer learning can make log-Mel spectrogram
            not lead to improvements in the model. On the contrary,   input nearly as efficient as MFCC input.
            accuracy decreased. Transfer learning, on the other
            hand, proved to be important in this domain, as CNN14   5.2. Limitations and future work
            achieved superior results compared to all other models and   In future works, we plan to investigate other audio-related
            is comparable to the current state of the art in the literature   features, such as autocorrelation, jitter, and shimmer. We
            for this task.                                     also intend to investigate the beginning of a sentence.
              Furthermore, with respect to the training process,   When a speaker starts to produce a sentence, they have
            we noted in preliminary experiments some variance in   more air in their lungs, which decreases as they speak. Some
            the aspects a model can focus on during inference. The   models may focus more on the audio at the beginning,
            structure of pauses, syntactic boundaries, and pretonic   measuring the signal energy, as the initial energy in the
            syllables, among other factors, may be more or less   audio may provide hints about pulmonary capacity. In
            evidenced by the models after training. This result is   addition, we plan to investigate models of related diseases,
            expected because  artificial  neural  networks are  high-  such as general cases of respiratory insufficiency. Finally,
            variance, low-bias classifiers with randomized parameter   we aim to investigate the variance in model training,
            initialization. We observed that transfer learning appears   identifying factors that are important for model inference
            to reduce this variance.                           and techniques that reduce variance in the learned models
                                                               (such as transfer learning).
              Regarding the qualitative analysis, our first case
            study indicated that detailed evaluation would be better   6. Conclusion
            performed  in  the  spectrograms-only  scenario,  which
            allowed for audio resynthesis, improving the process. As   This work presents a method for interpretability analysis
            a result  of this analysis, we  can formulate  the following   of audio classification for COVID-19 detection based
            hypotheses to explain the obtained variance and    on CNNs. Our work focuses on explainable AI. We
            understand the data aspects that may play a role in model   investigated the importance of different features in the
            learning:                                          training process and generated heat maps to understand
            (i)  H1: Pauses are important clues for detecting   the model’s reasoning for its predictions.
               COVID-19 since patients tend to make more pauses   Regarding the input data, our results show that
               for breathing than the control group.           spectrograms are a suitable representation for COVID-19
            (ii)  H2: As the air starts decreasing in the lungs, the   detection. F0 appears to be almost as efficient as
               speaker may begin to lose breath, or the signal energy   spectrograms, and the combination of these two inputs led
               may begin to decrease. Thus, energy over time can be   to a small increase in the model performance. Grad-CAM
               an important clue.                              analysis indicates that F0 is a more important feature than
            (iii) H3: An interplay between syntax and prosody is   F0-STD, sex, and age. Moreover, Grad-CAM and audio
               expected to emerge as a boundary marked by formant   resynthesis helped us formulate hypotheses about the
               vowel high energy, i.e., phonetically.          factors that determine the model’s classification process
              The first hypothesis confirms that deep models use the   and  confirm  that  the  deep  models  used  do  not  rely  on
            discrepancy in the structure of pauses between patients   environmental noise for decision-making. Our best model
                                                         18
            and controls, as observed by Fernandes-Svartman et al.    (CNN14) achieved 94.44% accuracy, on par with the best
            The second and third hypotheses are newly observed   models in the literature12.
            discrepancies, which were found to be present by deep
            learning models.                                   Acknowledgments
              Our work also confirms the hypothesis from previous   We gratefully acknowledge the support of NVIDIA
            works  that the addition of hospital ward noise, alongside   corporation with the donation of a GPU used in part of the
                 9,12
            suitable preprocessing steps, prevents the models from   experiments presented in this research.
            making  biased  decisions  in  the COVID-19  detection   Funding
            task. Through Grad-CAM analysis, we confirm that deep
            models focus on the voice (or silent pauses) rather than on   This work was supported by FAPESP grants 2022/16374-6
            environmental noise.                               (MMG), 2020/06443-5 (SPIRA), and 2023/00488-5


            Volume 1 Issue 3 (2024)                        123                               doi: 10.36922/aih.2992
   124   125   126   127   128   129   130   131   132   133   134