Page 105 - AIH-1-3
P. 105
Artificial Intelligence in Health ISM: A new multi-view space-learning model
and the fact that NMF runs on the concatenated views, and MVMDS use a 10-factorization rank, GFA uses a
thus tending to ignore the smallest ones. Although GFA 12-factorization rank, and MOFA+ uses a 13-factorization
and MOFA+ are closely related, MOFA+ fails to recover rank. ISM uses a primary embedding of dimension 16 and
common factors in the Reuters dataset, while GFA fails in a 16-factorization rank. The clusterings of the marker genes
the prokaryotic dataset. MVMDS performs relatively well are shown in Figure 5. ISM outperforms the other methods
on all datasets, in most cases with lower factor sparsity with 14 cell type-specific clusters and higher metrics.
and specificity than ISM or ILSM. MOWGLI could only Regarding the positioning of the clusters on the 2D map,
be run on a fraction of the data for the Reuters and TEA- MVMDS places classical monocyte (monocyte C) and non-
seq multi-omic single-cell data due to its extremely high classical and intermediate monocytes (monocyte NC+I)
computational time. The poor performance observed can, opposite of each other, contrary to all other approaches and,
therefore, be attributed to the sampling itself. more importantly, against biological intuition. ISM and
3.2. Detailed results GFA methods outperform other methods on this dataset as
they reveal close proximity between transcriptionally and
3.2.1. UCI digits dataset functionally similar cell types of the major immune cell
PCA, MVMDS, MOFA+, and MOWGLI use a families. Indeed, three cell types from the myeloid lineage,
10-factorization rank, while GFA uses a 9-factorization including monocytes C, monocytes NC+I, and myeloid
rank. ISM uses a primary embedding of dimension 9 and dendritic cells (mDC), are grouped together. A similar trend
a 10-factorization rank. The Karhunen-Love coefficients is observed for three cell types from the B cell family, where
contain data with mixed signs, so the corresponding view only ISM and GFA revealed close proximity of naïve B cells,
is split into its positive part and the absolute value of its memory B cells, and plasmablasts, out of the eight methods
negative part when applying the non-negative approaches considered. The most challenging cell types were in the T
ISM, NMF, and MOWGLI. The clusterings of the digits are cell family, where only ISM was able to identify clusters for
shown in Figure 3. ISM outperforms the other methods three cell types (CD4+ effectors, naïve T cells, and Vδ2+
with 10-digit-specific clusters. It should be noted that NMF T cells [VD+] gamma delta non-conventional T cells) and
performs slightly better than ISM in terms of purity index, place them in close proximity. VD+ gamma delta non-
ARI, NMI, and FMS. However, digits 5 and 3 are mixed conventional T cells share some similarities with NK cells
together, resulting in one less digit being recognized. PCA in terms of the expression of certain receptors, and only the
is far behind all other approaches, recognizing only four- ISM method was able to recognize both cell types and place
digit classes. them in close proximity, highlighting their similarity. The
ISM method also captured subtle similarities between two
Figure 4 shows how the views affect the individual ISM types of dendritic cells, mDC, and plasmacytoid dendritic
components using a treemap chart. For each component, cells (pDC), which correspond to antigen-presenting cells.
each view corresponds to a rectangle within a rectangular
display, where the size of the rectangle represents the Figure 6 shows the impact of the four patients on the
loading of the view. It is noteworthy that some components individual ISM components using a treemap chart. In this
are supported by only a few views, for example, component chart, each patient corresponds to a rectangle within a
1 (2 views) and component 8 (3 views), while others involve rectangular display, where the size of the rectangle represents
most views, for example, component 5 (6 views). As each the loading of the patient. In contrast to the UCI Digits
component is associated with a digit, this emphasizes the data, most components are supported by three patients
specifics and complementarity of the image representations (three components) or four patients (11 components). Two
that are dependent on the respective digit. It is also interesting components involve only two patients.
to note that for some components, the loadings of the views The loadings of the view-mapping matrix are shown in
are diametrically opposed to the respective number of Figure 7 using a treemap chart. Recall that each attribute of
attributes. For example, for component 8, the view of 240- this dataset is a combination of a patient and a cell type, in
pixel averages has the lowest loading, while the view of six which the expressions of 915 marker genes were measured.
morphological features has the highest loading. This clearly For each component, such a combination corresponds to
shows that the views are evenly balanced regardless of their a rectangle within a rectangular display, where the size of
respective number of attributes when using ISM. the rectangle represents the loading of the combination.
ISM components 1 and 2 are both associated with the same
3.2.2. Signature 915 data
cell type, pDC, while component 15 is simultaneously
Before the analysis, each marker gene was normalized associated with CD8-activated, VD2-, and VD2+ cells.
using the mean of the four highest expression values. PCA In the final clustering, the cluster comprising these three
Volume 1 Issue 3 (2024) 99 doi: 10.36922/aih.3427

