Page 92 - AIH-2-4
P. 92

Artificial Intelligence in Health                            Federated learning health stack against pandemics



            Cost total-local-server  = O(n·p·T )    (XXIII)      To address statistical heterogeneity or non-uniform data
                               add
                                                               distribution, which significantly impacts model accuracy,
            3.3.3. Central server-side computation             increased communication rounds are often necessary.
            Similar to local servers, the central server performed   However, this can introduce bias in the global model,
            aggregation of gradients received from n local servers. The   particularly disadvantaging clients with underrepresented
            computation cost at the central server is represented as:  data from various institutions. Therefore, aligning the
                                                               distributions of data across medical centers is critical
                          Cost        = O(n·T )
                             total-local-server  add           to mitigating model bias caused by variations in the
            3.3.4. Total computation cost                      population of patients or data collection techniques. 58,63
                                                               Class balancing  should be supported with equal
                                                                            64
            Combining the computation costs at the client, local server,   representation of all disease classes or conditions across
            and central server levels, the total computation cost for the   federated nodes to prevent biased learning outcomes.
            proposed framework is represented as:              Additionally, standardization of quality  is necessary
                                                                                                 65
            Cost total  = O(p·(|D  |·M + Enc(|G |) + Dec(|G |)) +   to normalize data collected via varying equipment and
                                     ij
                                               ij
                                     r
                                               r
                          ij
            n·p·T  + n·T )                          (XXIV)     protocols to enhance uniformity and reliability. Moreover,
                      add
                add
                                                                             66
              It can be further simplified to:                 volume balancing  helps prevent dominant contributions
                                                               from larger hospitals, ensuring equitable learning from all
            Cost   = O(p·(|D |·M + Enc(|G |) + Dec(|G |) + n·T ))  centers.
                                                ij
                                      ij
               total      ij         r          r      add
                                                     (XXV)
                                                                 To fulfill these requirements, GANs, especially
              The   hierarchical  structure  optimized  both   robust diffusion models, offer a promising method for
            communication  and  computation  by  leveraging  local   establishing data uniformity across hospitals with varied
            servers to consolidate updates before transmitting them   dataset sizes. By generating synthetic images to supplement
            to  the  central  server.  This  ensured  scalability, even in   existing  datasets, GANs  enable  more balanced training
            scenarios with large datasets and numerous participants.  with minimal bias. For example, if three medical centers
                                                               have 500, 400, and 250 data points, respectively, GANs
            4. Discussion                                      can generate synthetic images to equalize each dataset to
            In  this  section,  the  issue  of  data  heterogeneity  in  FL   approximately 500 data points. Compared to traditional
            is addressed, specifically in the context of datasets   weighted averaging of model parameters, this approach
            distributed across different medical centers or countries.   provides a more balanced solution for hierarchical medical
            FL encounters significant challenges in real-world medical   system performance. The working principle is based on
            settings owing to the intrinsic heterogeneity among   iterative noise addition and removal, where the generator
            contributing institutions. Data heterogeneity arises when   network analyzes the denoising function to reconstruct the
            data distribution varies substantially across clients, leading   original image.
            to non-IID data.  Heterogeneity may manifest as statistical   Despite the benefits of data augmentation via GANs, FL
                         58
            variations, differences in system capabilities, disparities   in medical imaging still encounters challenges due to the
            in model architecture, and additional challenges. 59,60    inherent diversity of imaging data. Scans from different sites
            While the proposed model can directly handle IID   vary in scanner type, protocol, and patient demographic,
            datasets, its robustness is demonstrated by showing how   making synthetic data approaches more complex.  Recent
                                                                                                      67
            it can manage non-IID datasets. Several techniques are   FL frameworks, such as distributed synthetic learning,
            proposed to mitigate data imbalance effects and enhance   aim to train GANs to produce a single homogeneous
            model performance in hierarchical systems designed for   dataset of synthetic images for use by all clients,  yet
                                                                                                         67
            managing medical data management.                  practical concerns remain. Specifically, the application of
              According to recent studies, and within the context   differential privacy can hamper performance. For example,
                                                                         68
            of hierarchical medical data management, generative   Kossen et al.  reported that enforcing a privacy parameter
            adversarial networks (GANs), particularly the newly   ε ≈ 7.4 on GAN-produced angiograms lowered a U-net
            developed robust diffusion models,  can effectively achieve   vessel segmentation’s dice score from 0.84 to 0.75.
                                       61
            uniformity in data availability across medical facilities.   In addition, GAN-augmented FL models are susceptible
            ℤadeh  et  al.  utilized GANs for cross-modality brain   to membership inference attacks (MIAs). MIAs allow
                      62
            image synthesis, including transformations such as CT   attackers to deduce whether a particular data point belongs
            to positron emission tomography (PET), CT to magnetic   to the training set. For example, ℤhang et al.  demonstrated
                                                                                                 69
            resonance imaging (MRI), MRI to PET, and vice versa.  class-level and user-level MIAs with GANs, achieving over


            Volume 2 Issue 4 (2025)                         86                          doi: 10.36922/AIH025080013
   87   88   89   90   91   92   93   94   95   96   97