Page 95 - AIH-1-3
P. 95
Artificial Intelligence in Health
ORIGINAL RESEARCH ARTICLE
Integrated sources model: A new space-learning
model for heterogeneous multi-view data
reduction, visualization, and clustering
3
1
Paul Fogel * , Christophe Geissler , Franck Augé 2 , Galina Boldina , and
1
George Luta 4
1 Data Services, Mazars, Courbevoie, France
2 Translational Precision Medicine, Sanofi, Vitry-sur-Seine, France
3 Precision Medicine and Computational Biology, Sanofi, Vitry-sur-Seine, France
4 Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical
Center, Washington, D.C., United States of America
Abstract
In machine learning, multi-view data involve multiple distinct sets of attributes
(“views”) for a common set of observations; when each view has the same attributes
considered in different contexts, the data are said to contain multiple views of
homogeneous format, which can be conceptualized as a tensor. In this article, we
describe a novel approach for integrating multiple views of heterogeneous format
into a common latent space using a workflow that involves non-negative matrix and
tensor factorization (NMF/NTF). This approach, which we refer to as the integrated
*Corresponding author: sources model (ISM), consists of two main steps: Embedding and analysis. In the
Paul Fogel
(paul.fogel@mazars.fr) embedding step, the views are transformed into matrices with common non-
negative components. In the analysis step, the transformed views are combined into
Citation: Fogel P, Geissler C,
Augé F, Boldina G, Luta G. a tensor and decomposed using NTF. We also present a variant of ISM; the integrated
Integrated sources model: A latent sources model (ILSM), which offers significant advantages over ISM in terms
new space-learning model for of computational power and in cases where the views are highly unbalanced with
heterogenous multi-view data
reduction, visualization, and regard to the number of attributes per view. Noteworthy, ISM can be extended to
clustering. Artif Intell Health. process multi-omic and multi-view datasets even in the presence of missing views.
2024;1(3):89-113. We provide a proof-of-concept analysis using five examples, including the UCI Digits
doi: 10.36922/aih.3427 (the University of California Irvine Pen-Based Recognition of Handwritten Digits)
Received: April 16, 2024 dataset, a public cell-type gene signatures dataset, and a multi-omic single-cell
Accepted: June 5, 2024 dataset. These examples demonstrate that, in most cases, multi-view clustering is
better achieved with ISM or its variant ILSM than with other latent space approaches.
Published Online: July 24, 2024 We also show how the non-negativity and sparsity of the ISM model components
Copyright: © 2024 Author(s). enable straightforward interpretations, in contrast to other approaches that involve
This is an Open-Access article latent factors of mixed signs. Finally, we present potential applications to single-cell
distributed under the terms of the
Creative Commons Attribution multi-omics and spatial mapping, including spatial imaging, spatial transcriptomics,
License, permitting distribution, and computational biology, which are currently under evaluation. ISM relies on state-
and reproduction in any medium, of-the-art algorithms invoked through a simple workflow implemented in Python.
provided the original work is
properly cited.
Publisher’s Note: AccScience Keywords: Principal component analysis; Non-negative matrix factorization; Non-
Publishing remains neutral with negative tensor factorization; Multi-view clustering; Canonical correlation analysis;
regard to jurisdictional claims in
published maps and institutional Common principal components; Multidimensional scaling
affiliations.
Volume 1 Issue 3 (2024) 89 doi: 10.36922/aih.3427

