Page 92 - AIH-2-4
P. 92
Artificial Intelligence in Health Federated learning health stack against pandemics
Cost total-local-server = O(n·p·T ) (XXIII) To address statistical heterogeneity or non-uniform data
add
distribution, which significantly impacts model accuracy,
3.3.3. Central server-side computation increased communication rounds are often necessary.
Similar to local servers, the central server performed However, this can introduce bias in the global model,
aggregation of gradients received from n local servers. The particularly disadvantaging clients with underrepresented
computation cost at the central server is represented as: data from various institutions. Therefore, aligning the
distributions of data across medical centers is critical
Cost = O(n·T )
total-local-server add to mitigating model bias caused by variations in the
3.3.4. Total computation cost population of patients or data collection techniques. 58,63
Class balancing should be supported with equal
64
Combining the computation costs at the client, local server, representation of all disease classes or conditions across
and central server levels, the total computation cost for the federated nodes to prevent biased learning outcomes.
proposed framework is represented as: Additionally, standardization of quality is necessary
65
Cost total = O(p·(|D |·M + Enc(|G |) + Dec(|G |)) + to normalize data collected via varying equipment and
ij
ij
r
r
ij
n·p·T + n·T ) (XXIV) protocols to enhance uniformity and reliability. Moreover,
add
add
66
It can be further simplified to: volume balancing helps prevent dominant contributions
from larger hospitals, ensuring equitable learning from all
Cost = O(p·(|D |·M + Enc(|G |) + Dec(|G |) + n·T )) centers.
ij
ij
total ij r r add
(XXV)
To fulfill these requirements, GANs, especially
The hierarchical structure optimized both robust diffusion models, offer a promising method for
communication and computation by leveraging local establishing data uniformity across hospitals with varied
servers to consolidate updates before transmitting them dataset sizes. By generating synthetic images to supplement
to the central server. This ensured scalability, even in existing datasets, GANs enable more balanced training
scenarios with large datasets and numerous participants. with minimal bias. For example, if three medical centers
have 500, 400, and 250 data points, respectively, GANs
4. Discussion can generate synthetic images to equalize each dataset to
In this section, the issue of data heterogeneity in FL approximately 500 data points. Compared to traditional
is addressed, specifically in the context of datasets weighted averaging of model parameters, this approach
distributed across different medical centers or countries. provides a more balanced solution for hierarchical medical
FL encounters significant challenges in real-world medical system performance. The working principle is based on
settings owing to the intrinsic heterogeneity among iterative noise addition and removal, where the generator
contributing institutions. Data heterogeneity arises when network analyzes the denoising function to reconstruct the
data distribution varies substantially across clients, leading original image.
to non-IID data. Heterogeneity may manifest as statistical Despite the benefits of data augmentation via GANs, FL
58
variations, differences in system capabilities, disparities in medical imaging still encounters challenges due to the
in model architecture, and additional challenges. 59,60 inherent diversity of imaging data. Scans from different sites
While the proposed model can directly handle IID vary in scanner type, protocol, and patient demographic,
datasets, its robustness is demonstrated by showing how making synthetic data approaches more complex. Recent
67
it can manage non-IID datasets. Several techniques are FL frameworks, such as distributed synthetic learning,
proposed to mitigate data imbalance effects and enhance aim to train GANs to produce a single homogeneous
model performance in hierarchical systems designed for dataset of synthetic images for use by all clients, yet
67
managing medical data management. practical concerns remain. Specifically, the application of
According to recent studies, and within the context differential privacy can hamper performance. For example,
68
of hierarchical medical data management, generative Kossen et al. reported that enforcing a privacy parameter
adversarial networks (GANs), particularly the newly ε ≈ 7.4 on GAN-produced angiograms lowered a U-net
developed robust diffusion models, can effectively achieve vessel segmentation’s dice score from 0.84 to 0.75.
61
uniformity in data availability across medical facilities. In addition, GAN-augmented FL models are susceptible
ℤadeh et al. utilized GANs for cross-modality brain to membership inference attacks (MIAs). MIAs allow
62
image synthesis, including transformations such as CT attackers to deduce whether a particular data point belongs
to positron emission tomography (PET), CT to magnetic to the training set. For example, ℤhang et al. demonstrated
69
resonance imaging (MRI), MRI to PET, and vice versa. class-level and user-level MIAs with GANs, achieving over
Volume 2 Issue 4 (2025) 86 doi: 10.36922/AIH025080013

