Page 122 - AIH-1-3
P. 122

Artificial Intelligence in Health                                 Interpretability of deep models for COVID-19



            2. Related work                                    deep learning models found in the literature, aiming for
                                                               a better understanding of their results. To achieve this, we
            In the literature, COVID-19 detection has been studied   proposed the use of the Grad-CAM algorithm,  analyzing
                                                                                                    10
            using  different  types  of  input  features  for  classification.   its heat maps and synthesizing audios based on those heat
            From the perspective of feature analysis, these inputs can   maps. These operations provide valuable insights into how
            be roughly grouped into two categories: White-box or   deep models make their decisions. A similar approach to
            black-box, based on their ease of interpretation.  ours can be found in the study by Sobahi et al.,  where the
                                                                                                    22
              An example of an approach using mostly white-box   authors used Grad-CAM to visualize the results generated
            features is the work of Bartl-Pokorny et al.  The authors   by their proposed model for COVID-19 detection through
                                               15
            used 88 features extracted from audios containing vowels   cough sounds. Grad-CAM allowed them to identify
            to measure how COVID-19 patients differ from the control   which regions of the input were most relevant to the
            group. They found that F0-STD commonly varies between   model’s decision-making process. In addition, in a slightly
            these two groups. In our work, we also used F0, F0-STD,   different domain, previous works have used Grad-CAM to
            and included sex and age as inputs for our deep models to   analyze COVID-19 detection models based on chest X-ray
            detect COVID-19. Sex and age were included following the   images. 23,24
            findings of previous works, 16-18  which identified that these
            factors influence F0 and F0-STD in COVID-19 patients.   3. Methods
            It was found that women and elderly subjects present   We used the SPIRA dataset from a previous study,  which
                                                                                                       9
            more differences in these two parameters, as their voices   contains spoken utterances from 432 speakers, including
            become higher-pitched and less stable. Moreover, the study   both patient and control group members. Audios were
            by  Fernandes-Svartman  et al.   demonstrated  that  the   collected in COVID-19 wards where patients were
                                     18
            structure of pauses in speech undergoes significant changes   hospitalized due to respiratory insufficiency, conventionally
            between  controls  and  hospitalized  COVID-19  patients,   defined as a blood oxygen saturation level below 92%.
            even proposing a white-box model, which achieves above   Control group members were recorded using an application
            87% accuracy using solely the speech pause distribution.  over the Internet. We used the same division into training,

                                                                                                   9
              Regarding black-box features, Schuller et al.  proposed   validation, and test sets as the previous study,  maintaining
                                                 19
            a challenge for COVID-19 detection from both speech   a balance by age and sex. Specifically, the dataset was
            and cough audios using the Cambridge COVID-19 Sound   divided into 292 training audios, 32 validation audios, and
            database.  They performed baseline experiments and   108 test audios. The dataset includes recordings of patients
                   6,7
            identified thousands of features that can be used for general   and control group members speaking an utterance with
            audio processing and, in particular, COVID-19 detection   no pre-defined pauses. The utterance is simple enough for
            in audio. Zheng  et al.  presented another example of   most to understand but complex enough to present several
                               11
            black-box features, where MFCCs proved to be a useful   polysyllabic words with primary and secondary stress
            method for COVID-19 detection while consuming few   syllables. The specific utterance was “o amor ao próximo
            computational resources.  More robust approaches use   ajuda a enfrentar o coronavírus com a força que a gente
                                9
            spectrograms, transfer learning, and data augmentation   precisa” (“love for your neighbor helps face the coronavirus
            for the task.  Recently, transformer-based architectures   with the strength we need”). The dataset used is available
                      20
            with MFCCs as input were used alongside transfer learning   at https://github.com/SPIRA-COVID19/SPIRA-ACL2021,
            in the study by Gauy and Finger,  achieving accuracy   and  the  codes  for  each  model  and  experiment  can  also
                                         12
            above 95%. CNN-based PANNs (e.g., CNN14), which use   be found at https://github.com/danpeixoto/covid19-
            spectrograms as input, were also used in a study by Gauy   interpretability-analysis.
            et  al.,  achieving comparable  accuracy  to transformer-  In this work, inspired by previous approaches, 12,13
                 21
            based architectures. Based on these results, we investigated   we  employed  transfer  learning  methods  from  pre-
            the use of spectrograms for COVID-19  detection. In   trained models (described in Section 3.1). Following
            addition, transfer learning and data augmentation were   established methods,  we explored data augmentation
                                                                                20
            employed in the study.                             techniques (described in Section 3.2). Similar to
              Related work either uses white-box features (e.g., sex   previous studies,  we utilized audio splitting based on
                                                                             9,12
            and age) to better understand the effect of COVID-19 on   windowing (described in Section 3.3). During training, we
            patient’s audio or black-box features (e.g., spectrograms)   performed preprocessing steps (described in Section 3.4)
            alongside  deep  learning for higher  accuracy  in   on the audios, termed dynamic preprocessing,  to tackle
                                                                                                     9
            COVID-19 detection tasks. In this work, we proposed   overfitting issues.  By combining all the  aforementioned
            an interpretability analysis of the decisions made by the   techniques, we performed six experiments, described in


            Volume 1 Issue 3 (2024)                        116                               doi: 10.36922/aih.2992
   117   118   119   120   121   122   123   124   125   126   127