Page 95 - AIH-1-3
P. 95

Artificial Intelligence in Health





                                        ORIGINAL RESEARCH ARTICLE
                                        Integrated sources model: A new space-learning

                                        model for heterogeneous multi-view data
                                        reduction, visualization, and clustering



                                                                                                   3
                                                                      1
                                        Paul Fogel * , Christophe Geissler , Franck Augé 2  , Galina Boldina , and
                                                 1
                                        George Luta 4
                                        1 Data Services, Mazars, Courbevoie, France
                                        2 Translational Precision Medicine, Sanofi, Vitry-sur-Seine, France
                                        3 Precision Medicine and Computational Biology, Sanofi, Vitry-sur-Seine, France
                                        4 Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical
                                        Center, Washington, D.C., United States of America




                                        Abstract
                                        In  machine  learning,  multi-view  data  involve  multiple  distinct  sets  of  attributes
                                        (“views”) for a common set of observations; when each view has the same attributes
                                        considered in different contexts, the data are said to contain multiple views of
                                        homogeneous format, which can be conceptualized as a tensor. In this article, we
                                        describe a novel approach for integrating multiple views of heterogeneous format
                                        into a common latent space using a workflow that involves non-negative matrix and
                                        tensor factorization (NMF/NTF). This approach, which we refer to as the integrated
            *Corresponding author:      sources model (ISM), consists of two main steps: Embedding and analysis. In the
            Paul Fogel
            (paul.fogel@mazars.fr)      embedding step, the views are transformed into matrices with common non-
                                        negative components. In the analysis step, the transformed views are combined into
            Citation: Fogel P, Geissler C,
            Augé F, Boldina G, Luta G.   a tensor and decomposed using NTF. We also present a variant of ISM; the integrated
            Integrated sources model: A   latent sources model (ILSM), which offers significant advantages over ISM in terms
            new space-learning model for   of computational power and in cases where the views are highly unbalanced with
            heterogenous multi-view data
            reduction, visualization, and   regard to the number of attributes per view. Noteworthy, ISM can be extended to
            clustering. Artif Intell Health.   process multi-omic and multi-view datasets even in the presence of missing views.
            2024;1(3):89-113.           We provide a proof-of-concept analysis using five examples, including the UCI Digits
            doi: 10.36922/aih.3427      (the University of California Irvine Pen-Based Recognition of Handwritten Digits)
            Received: April 16, 2024    dataset, a public cell-type gene signatures dataset, and a multi-omic single-cell
            Accepted: June 5, 2024      dataset. These examples demonstrate that, in most cases, multi-view clustering is
                                        better achieved with ISM or its variant ILSM than with other latent space approaches.
            Published Online: July 24, 2024  We also show how the non-negativity and sparsity of the ISM model components
            Copyright: © 2024 Author(s).   enable straightforward interpretations, in contrast to other approaches that involve
            This is an Open-Access article   latent factors of mixed signs. Finally, we present potential applications to single-cell
            distributed under the terms of the
            Creative Commons Attribution   multi-omics and spatial mapping, including spatial imaging, spatial transcriptomics,
            License, permitting distribution,   and computational biology, which are currently under evaluation. ISM relies on state-
            and reproduction in any medium,   of-the-art algorithms invoked through a simple workflow implemented in Python.
            provided the original work is
            properly cited.
            Publisher’s Note: AccScience   Keywords: Principal component analysis; Non-negative matrix factorization; Non-
            Publishing remains neutral with   negative tensor factorization; Multi-view clustering; Canonical correlation analysis;
            regard to jurisdictional claims in
            published maps and institutional   Common principal components; Multidimensional scaling
            affiliations.


            Volume 1 Issue 3 (2024)                         89                               doi: 10.36922/aih.3427
   90   91   92   93   94   95   96   97   98   99   100