Page 122 - AIH-1-3
P. 122
Artificial Intelligence in Health Interpretability of deep models for COVID-19
2. Related work deep learning models found in the literature, aiming for
a better understanding of their results. To achieve this, we
In the literature, COVID-19 detection has been studied proposed the use of the Grad-CAM algorithm, analyzing
10
using different types of input features for classification. its heat maps and synthesizing audios based on those heat
From the perspective of feature analysis, these inputs can maps. These operations provide valuable insights into how
be roughly grouped into two categories: White-box or deep models make their decisions. A similar approach to
black-box, based on their ease of interpretation. ours can be found in the study by Sobahi et al., where the
22
An example of an approach using mostly white-box authors used Grad-CAM to visualize the results generated
features is the work of Bartl-Pokorny et al. The authors by their proposed model for COVID-19 detection through
15
used 88 features extracted from audios containing vowels cough sounds. Grad-CAM allowed them to identify
to measure how COVID-19 patients differ from the control which regions of the input were most relevant to the
group. They found that F0-STD commonly varies between model’s decision-making process. In addition, in a slightly
these two groups. In our work, we also used F0, F0-STD, different domain, previous works have used Grad-CAM to
and included sex and age as inputs for our deep models to analyze COVID-19 detection models based on chest X-ray
detect COVID-19. Sex and age were included following the images. 23,24
findings of previous works, 16-18 which identified that these
factors influence F0 and F0-STD in COVID-19 patients. 3. Methods
It was found that women and elderly subjects present We used the SPIRA dataset from a previous study, which
9
more differences in these two parameters, as their voices contains spoken utterances from 432 speakers, including
become higher-pitched and less stable. Moreover, the study both patient and control group members. Audios were
by Fernandes-Svartman et al. demonstrated that the collected in COVID-19 wards where patients were
18
structure of pauses in speech undergoes significant changes hospitalized due to respiratory insufficiency, conventionally
between controls and hospitalized COVID-19 patients, defined as a blood oxygen saturation level below 92%.
even proposing a white-box model, which achieves above Control group members were recorded using an application
87% accuracy using solely the speech pause distribution. over the Internet. We used the same division into training,
9
Regarding black-box features, Schuller et al. proposed validation, and test sets as the previous study, maintaining
19
a challenge for COVID-19 detection from both speech a balance by age and sex. Specifically, the dataset was
and cough audios using the Cambridge COVID-19 Sound divided into 292 training audios, 32 validation audios, and
database. They performed baseline experiments and 108 test audios. The dataset includes recordings of patients
6,7
identified thousands of features that can be used for general and control group members speaking an utterance with
audio processing and, in particular, COVID-19 detection no pre-defined pauses. The utterance is simple enough for
in audio. Zheng et al. presented another example of most to understand but complex enough to present several
11
black-box features, where MFCCs proved to be a useful polysyllabic words with primary and secondary stress
method for COVID-19 detection while consuming few syllables. The specific utterance was “o amor ao próximo
computational resources. More robust approaches use ajuda a enfrentar o coronavírus com a força que a gente
9
spectrograms, transfer learning, and data augmentation precisa” (“love for your neighbor helps face the coronavirus
for the task. Recently, transformer-based architectures with the strength we need”). The dataset used is available
20
with MFCCs as input were used alongside transfer learning at https://github.com/SPIRA-COVID19/SPIRA-ACL2021,
in the study by Gauy and Finger, achieving accuracy and the codes for each model and experiment can also
12
above 95%. CNN-based PANNs (e.g., CNN14), which use be found at https://github.com/danpeixoto/covid19-
spectrograms as input, were also used in a study by Gauy interpretability-analysis.
et al., achieving comparable accuracy to transformer- In this work, inspired by previous approaches, 12,13
21
based architectures. Based on these results, we investigated we employed transfer learning methods from pre-
the use of spectrograms for COVID-19 detection. In trained models (described in Section 3.1). Following
addition, transfer learning and data augmentation were established methods, we explored data augmentation
20
employed in the study. techniques (described in Section 3.2). Similar to
Related work either uses white-box features (e.g., sex previous studies, we utilized audio splitting based on
9,12
and age) to better understand the effect of COVID-19 on windowing (described in Section 3.3). During training, we
patient’s audio or black-box features (e.g., spectrograms) performed preprocessing steps (described in Section 3.4)
alongside deep learning for higher accuracy in on the audios, termed dynamic preprocessing, to tackle
9
COVID-19 detection tasks. In this work, we proposed overfitting issues. By combining all the aforementioned
an interpretability analysis of the decisions made by the techniques, we performed six experiments, described in
Volume 1 Issue 3 (2024) 116 doi: 10.36922/aih.2992

