Page 107 - AIH-1-3

P. 107

Artificial Intelligence in Health ISM: A new multi-view space-learning model

Figure 4. UCI Digits data: Treemap of integrated sources model view weights

all non-negative approaches yield a high view-mapping primary embedding of dimension 4 and a 4-factorization
sparsity index (i.e., 0.93, 0.89, 0.91, and 0.98 for ISM, rank (equal to the number of known categories). Since
MOWGLI, NMF, and NTF, respectively), as opposed to the provided views contain the principal components
mixed-sign approaches (i.e., 0.56, 0.30, 0.56, and 0.57 for explaining 90% of the variance, they need to be split into
MVMDS, GFA, MOFA+, and PCA, respectively). their positive part and the absolute value of their negative
part when applying the non-negative approaches ISM,
3.2.3. Reuters data
NMF, and MOWGLI. Overall, ILSM outperforms the other
MVMDS and MOWGLI use a 6-factorization rank, while methods, identifying three out of four categories (missing
MOFA+ and GFA use a 10-factorization rank. ISM uses a the category which the smallest size) and achieving higher
primary embedding of dimension 6 and a 6-factorization metrics.
rank (equal to the number of known categories). Overall,
ISM outperforms the other methods, identifying three out 3.2.5. TEA-seq multi-omic single-cell data
of six categories and achieving higher metrics, followed by MVMDS and MOWGLI use a 7-factorization rank, while
MVMDS. However, all performance indices are relatively GFA uses a 15-factorization rank. ISM uses a primary
low, as previously observed in Brbic and Kopriva. MOFA+ embedding of dimension 7 and a 7-factorization rank
22
fails to identify a common structure between the different (equal to the number of categories). MOFA+ metrics
views. It should be noted that MOWGLI was performed are not presented for this particular dataset since the
on only 20% of the samples due to its extremely high corresponding clustering was used to annotate the cells.
21
computational time, despite using an activated graphics We used a UMAP projection because the size of the dataset
processing unit (GPU). The poor performance observed
can, therefore, be attributed to the sampling itself. makes MDS impractical (Figure 8).
The ILSM outperforms the other methods, identifying
3.2.4. Prokaryotic data six out of seven cell types, missing only MAIT T cells,
MVMDS, MOFA+, and MOWGLI use a 4-factorization which are too close to CD4 effector and memory T cells to
rank, while GFA uses a 6-factorization rank. ILSM uses a be identified as a separate cluster.

Volume 1 Issue 3 (2024) 101 doi: 10.36922/aih.3427

102 103 104 105 106 107 108 109 110 111 112