Page 109 - AIH-1-3
P. 109

Artificial Intelligence in Health                                 ISM: A new multi-view space-learning model









































                                   Figure 6. Signature 915 data: Treemap of integrated sources model view weights


            CD8+ T cells and found that it actually represents CD8+   It should be noted that MOWGLI was performed on
            naive T cells. We then verified that the split observed in   only 20% of  the samples  and  20% of the scATAC-seq
            the other UMAP projections was also consistent with this   features  due  to  its  extremely  high  computational  time,
            cell type. In particular, the distance between CD8+ naive   despite using an activated GPU. The poor performance
            T cells and CD8+ T cells is minimal with ISM, consistent   observed can, therefore, be attributed to the sampling itself.
            with biology. In contrast, CD8+ naive T cells are closer to
            CD4+ naive T cells and not to CD8+ T cells in the NMF   3.3. Further insights regarding the model
            UMAP projection, contrary to biological intuition. One   3.3.1. Model’s potential dependency on embedding
            possible reason is that NMF does not take advantage of   dimension and rank
            the complementarity of different views by indiscriminately
            concatenating them.                                In this section, we evaluate how ISM performance might
                                                               be affected by changing the embedding dimension and
              Interestingly, and in contrast to other multi-view   the  rank  in the neighborhood  of  the  chosen  values.
            approaches, ISM allows the direct identification of   First, we examine the relative approximation error for an
            factors and views that are discriminative with respect to   embedding dimension in the neighborhood of the chosen
            a particular cell type (e.g., CD8+ naive T cells). From the   rank to select an optimal value, as described in the analysis
            factor specificities, we found two specific ISM latent factors   workflow. Second, we examine the relative error, number
            with positive factor specificities with respect to CD8+   of found classes, and purity for a rank in the neighborhood
            naive T cells (0.50 and 55, respectively). In all three multi-  of the chosen embedding dimension.
            omic modalities, these factors have close loadings in the
            view-weights matrix Q* (13.09/9.00 in the RNA-seq view,   For the UCI Digits data, where the chosen ISM rank is 10,
            10.82/8.04 in the ATAC-seq view, and 11.45/8.94 in the   an embedding dimension of 9 clearly minimizes the relative
            ADT view, respectively), highlighting the contribution of   error—0.52 versus 0.72 or higher for other dimensions.
            the three modalities to specifically distinguish CD8+ and   The number of classes found and the purity index are
            CD4+ cell subpopulations among naive T cells.      also significantly higher (Table 2, upper part). The relative



            Volume 1 Issue 3 (2024)                        103                               doi: 10.36922/aih.3427
   104   105   106   107   108   109   110   111   112   113   114