Page 30 - TD-3-1
P. 30

Tumor Discovery                                                       AI uncovers tumor spatial organization



            LL  recon   L KL                                 an extensive study utilizing this data, encompassing spot-
                                                               level information, layer-level data, and spatial marker
                                                                    25
                                               pZ))
             L   qZ(| ,)XA  [log p AZ (| )]  KL qZ XA(( |, )||(  (V)  genes.  The project comprises a total of 12 samples, each
                                                               dissection covering six neuronal layers plus white matter.
                                                               Consequently, eight samples were categorized into seven
              The term KL(.) denotes the Kullback-Leibler divergence
            between two probability distributions. Training VGAE   clusters, while the remaining four were grouped into five
                                                               (ground-truth,  cortical  layers  one  to  six,  white  matter).
            to minimize this objective function enables the model to   For validation of the GNN algorithm, sample 151673 was
            learn a  probabilistic  mapping of spots  to a latent space,
            facilitating meaningful and informative representations of   chosen as a representative due to its specificity. This sample
            the structure and features of the ST data. The downstream   entails 3,639 spots and 33,538 genes, with provided spot
            clustering  methods  partition  the  latent  embeddings   annotations.
            to detect spatial domains. Subsequently, the resulting   The second ST dataset originates from human breast
            clustering labels are compared with the ground-truth to   cancer tissue and is available through the 10× Visium
            assess accuracy and performance.                   dataset repository. This dataset holds significant value
                                                               for the analysis of heterogeneous tumor and immune
            2.2. GNN                                           microenvironments, given its substantial intratumoral
            In  the  VGAE  model,  the  encoder  utilizes  two  layers  of   and  intertumoral  variations.  To  facilitate  clustering
            GNNs to extract features and reduce dimensions. GNNs   estimation, the sample is divided into 20 regions using
                                                                       26
            facilitate the processing of ST data by enabling spots to learn   the SEDR  package, relying on pathological features and
            from and communicate with neighboring spots. Each spot   gene expression. These annotated regions provide the
            aggregates information from its neighbors, subsequently   foundation for clustering evaluation. In total, this dataset
            updating its own representation based on this aggregated   encompasses 3798 spots and 36,601 genes.
            data. The Pytorch_pyG package  in Python offers multiple   The 10× Xenium technology represents a novel approach
                                     23
            implementations of GNNs. Existing spatial clustering   integrating single-cell, spatial, and in situ analysis of FFPE
            architectures typically incorporate only a single GNN. In   tissue. Notably,  breast  cancer  tumor datasets  associated
            this study, we opt for the simple graph convolution (SGC)   with this technology were reprocessed and republished
            ConvNet  to construct the encoder.                 on December 6, 2022.  Leveraging its non-destructive
                   24
                                                                                  9
              The SGC ConvNet highlights issues of model complexity   workflow, Xenium  spatially aligns  RNA, protein,  and
            and redundant computations within GDL. To address these   histological data within a unified image. This feature
            defects, SGC ConvNet aims to minimize collapsing weight   empowers us to discern cell types and their corresponding
            matrices and nonlinearities between successive layers.   gene expression profiles at a single-cell resolution. In the
            This streamlined linear model demonstrated comparable   breast cancer tumor  dataset, a  remarkable seventeen
            or  even  superior  performance  at  both  theoretical  and   distinct cell types have been identified, amounting to
            experimental levels. Notably, the convolution kernel in   164,079 cells and utilizing a 313-plex gene panel. To alleviate
            SGC is redefined as a linear function (Equation VI):  computational load, this manuscript employs a segmented
                                                               version of this data for clustering comparison, comprising


                                                     K
                                ()1

            Y ˘ SGC   soft max ...S SSX ()2  ... ()K    soft max SX     15 cell types, 11,996 cells, and the 313 gene panel.
                                                       (VI)    2.4. Data pre-processing and hyperparameters
              where  S is the normalized adjacent matrix,  X is the   In  this  article,  data  pre-processing  and  VGAE  training
            feature matrix,  Θ is the weight matrix, and softmax   are conducted within a Python virtual environment using
            indicates the normalized exponential function.     PyTorch_pyG, Squidpy, and Scanpy toolkits. Initially,
                                                               gene expression profiles undergo normalization and log
            2.3. ST datasets                                   transformation using Scanpy.  Users also have the option
                                                                                      27
            Various types of ST data from tumors were utilized to   to select “SCTransform” for gene expression normalization.
            evaluate the proposed spatial clustering architecture. These   Three thousand highly variable genes are selected to
            datasets were generated using diverse ST technologies,   construct the feature matrix. Subsequently, the scikit-learn
            resulting in variations in resolution, spot counts, and gene   toolkit  employs a nearest-neighbor search technique to
                                                                    28
            profiles.  Specifically,  the  human  dorsolateral  prefrontal   calculate  the  adjacency  matrix.  The  neighbors  for  each
            cortex (DLPFC) ST data was obtained from the 10×   spot are determined using either the k-nearest neighbor
            Visium platform, and the spatialLIBD project conducted   or radius-nearest neighbor modes. Specifically, for the 10×


            Volume 3 Issue 1 (2024)                         4                          https://doi.org/10.36922/td.2049
   25   26   27   28   29   30   31   32   33   34   35