Page 105 - AIH-1-3
P. 105

Artificial Intelligence in Health                                 ISM: A new multi-view space-learning model



            and the fact that NMF runs on the concatenated views,   and  MVMDS  use a  10-factorization  rank,  GFA uses a
            thus tending to ignore the smallest ones. Although GFA   12-factorization rank, and MOFA+ uses a 13-factorization
            and MOFA+ are closely related, MOFA+ fails to recover   rank. ISM uses a primary embedding of dimension 16 and
            common factors in the Reuters dataset, while GFA fails in   a 16-factorization rank. The clusterings of the marker genes
            the prokaryotic dataset. MVMDS performs relatively well   are shown in Figure 5. ISM outperforms the other methods
            on  all  datasets,  in  most  cases  with  lower  factor  sparsity   with 14 cell type-specific clusters and higher metrics.
            and specificity than ISM or ILSM. MOWGLI could only   Regarding the positioning of the clusters on the 2D map,
            be run on a fraction of the data for the Reuters and TEA-  MVMDS places classical monocyte (monocyte C) and non-
            seq multi-omic single-cell data due to its extremely high   classical and intermediate monocytes (monocyte NC+I)
            computational time. The poor performance observed can,   opposite of each other, contrary to all other approaches and,
            therefore, be attributed to the sampling itself.   more importantly, against biological intuition. ISM and
            3.2. Detailed results                              GFA methods outperform other methods on this dataset as
                                                               they reveal close proximity between transcriptionally and
            3.2.1. UCI digits dataset                          functionally similar cell types of the major immune cell
            PCA, MVMDS, MOFA+, and MOWGLI use a                families. Indeed, three cell types from the myeloid lineage,
            10-factorization rank, while GFA uses a 9-factorization   including monocytes C,  monocytes NC+I,  and myeloid
            rank. ISM uses a primary embedding of dimension 9 and   dendritic cells (mDC), are grouped together. A similar trend
            a 10-factorization rank. The Karhunen-Love coefficients   is observed for three cell types from the B cell family, where
            contain data with mixed signs, so the corresponding view   only ISM and GFA revealed close proximity of naïve B cells,
            is split into its positive part and the absolute value of its   memory B cells, and plasmablasts, out of the eight methods
            negative part when applying the non-negative approaches   considered. The most challenging cell types were in the T
            ISM, NMF, and MOWGLI. The clusterings of the digits are   cell family, where only ISM was able to identify clusters for
            shown in Figure 3. ISM outperforms the other methods   three cell types (CD4+ effectors, naïve T cells, and Vδ2+
            with 10-digit-specific clusters. It should be noted that NMF   T cells [VD+] gamma delta non-conventional T cells) and
            performs slightly better than ISM in terms of purity index,   place them in close proximity. VD+ gamma delta non-
            ARI, NMI, and FMS. However, digits 5 and 3 are mixed   conventional T cells share some similarities with NK cells
            together, resulting in one less digit being recognized. PCA   in terms of the expression of certain receptors, and only the
            is far behind all other approaches, recognizing only four-  ISM method was able to recognize both cell types and place
            digit classes.                                     them in close proximity, highlighting their similarity. The
                                                               ISM method also captured subtle similarities between two
              Figure 4 shows how the views affect the individual ISM   types of dendritic cells, mDC, and plasmacytoid dendritic
            components using a treemap chart. For each component,   cells (pDC), which correspond to antigen-presenting cells.
            each view corresponds to a rectangle within a rectangular
            display, where the size of the rectangle represents the   Figure 6 shows the impact of the four patients on the
            loading of the view. It is noteworthy that some components   individual ISM components using a treemap chart. In this
            are supported by only a few views, for example, component   chart, each patient corresponds to a rectangle within a
            1 (2 views) and component 8 (3 views), while others involve   rectangular display, where the size of the rectangle represents
            most views, for example, component 5 (6 views). As each   the  loading of  the  patient.  In  contrast to  the  UCI Digits
            component is associated with a digit, this emphasizes the   data,  most  components  are  supported  by  three  patients
            specifics and complementarity of the image representations   (three components) or four patients (11 components). Two
            that are dependent on the respective digit. It is also interesting   components involve only two patients.
            to note that for some components, the loadings of the views   The loadings of the view-mapping matrix are shown in
            are diametrically opposed to the respective number of   Figure 7 using a treemap chart. Recall that each attribute of
            attributes. For example, for component 8, the view of 240-  this dataset is a combination of a patient and a cell type, in
            pixel averages has the lowest loading, while the view of six   which the expressions of 915 marker genes were measured.
            morphological features has the highest loading. This clearly   For each component, such a combination corresponds to
            shows that the views are evenly balanced regardless of their   a rectangle within a rectangular display, where the size of
            respective number of attributes when using ISM.    the rectangle represents the loading of the combination.
                                                               ISM components 1 and 2 are both associated with the same
            3.2.2. Signature 915 data
                                                               cell type, pDC, while component 15 is simultaneously
            Before the analysis, each marker gene was normalized   associated with CD8-activated, VD2-, and VD2+ cells.
            using the mean of the four highest expression values. PCA   In the final clustering, the cluster comprising these three


            Volume 1 Issue 3 (2024)                         99                               doi: 10.36922/aih.3427
   100   101   102   103   104   105   106   107   108   109   110