Page 41 - AIH-1-4
P. 41
Artificial Intelligence in Health Segmentation and classification of DR using CNN
corresponding masks, along with calculating the total setups, enabling researchers to assess the impact of data
number of samples. The “getitem” method is responsible augmentation on model performance. The implementation
for reading and pre-processing each sample, where the adheres to best practices in data augmentation for medical
fundus image undergoes normalization and transposition image analysis, contributing to the overall reliability and
to align its dimensions appropriately for the subsequent generalization of the U-Net model for retinal fundus image
18
U-Net input. Simultaneously, the binary mask is read segmentation tasks. 21
and expanded to accommodate the model’s requirements.
Both the pre-processed image and mask are converted to 2.3. Network architecture
PyTorch tensors before being returned as a tuple. 27 The neural network architecture presented in the code
This dataset class provides a seamless interface for is a U-Net, a popular architecture widely used in image
interacting with the DRIVE dataset, offering a standardized segmentation tasks. The U-Net architecture consists of an
and efficient means of loading and preparing data for encoder-decoder structure with skip connections, allowing
training and evaluation. The “len” method ensures that the the model to capture both high-level and low-level features
total number of samples can be easily accessed, facilitating effectively. The encoder portion of the network employs
28
iterative processes during model training and validation. convolutional blocks to extract hierarchical features
from the input image. Specifically, it comprises four
30
Overall, the data pre-processing workflow is encapsulated encoder blocks, each consisting of two convolutional
within this dataset class, contributing to the robustness layers with batch normalization and rectified linear unit
and adaptability of the U-Net architecture for semantic
segmentation tasks on retinal fundus images. 29 activation functions, followed by max-pooling layers for
downsampling, as presented in Figure 4. 10
2.2. Data augmentation The bottleneck layer acts as a feature representation
The data augmentation process plays a pivotal role in for the entire input image, condensing the learned
34
enhancing the robustness and diversity of the dataset used features. It consists of a convolutional block with the
for training machine learning models, particularly in the same structure as the encoder blocks. The decoder
domain of medical image segmentation. In the provided portion of the network utilizes transposed convolutions
code, a comprehensive data augmentation pipeline is for up-sampling and concatenates the features from the
28
implemented to augment retinal fundus images and their corresponding encoder block through skip connections.
corresponding masks. The primary goal is to introduce This enables the decoder to recover spatial information
variability in the dataset by applying horizontal flips, lost during the down-sampling process. The decoder
vertical flips, and rotation to the original images and masks. also incorporates convolutional blocks for feature
The augmentation process aims to simulate different refinement. 11
orientations and perspectives that may be encountered in The classifier at the end of the network is a 1 × 1
real-world scenarios, thereby enriching the dataset and convolutional layer, mapping the features to a single
improving the generalization capability of the subsequent channel output, which is suitable for binary segmentation
U-Net model. 9 tasks. The entire architecture is designed for semantic
24
The “augment_data” function iterates through the segmentation, particularly for tasks where precise
training dataset, applying various augmentations to each delineation of object boundaries is crucial. In summary,
image-mask pair. The augmentations include horizontal this U-Net architecture facilitates robust feature extraction,
flips, vertical flips, and rotations, with the associated masks effective information fusion through skip connections, and
adjusted accordingly. The resulting augmented images and accurate segmentation outputs. 3
masks are resized to a standardized dimension of (512 × 512). 2.4. Training process
The augmented data is then saved in a separate directory
structure, creating distinct folders for augmented images 2.4.1. Pre-training
and masks. The use of data augmentation is particularly In the pre-training phase, the U-Net model is meticulously
valuable when the dataset size is limited, as it introduces configured with various hyperparameters to ensure
diversity that aids in preventing overfitting and improves optimal performance. The image dimensions (H [height]
the model’s ability to handle variations in real-world data.
× W [weight]), batch size, number of epochs, learning
It is important to note that the data augmentation rate, and checkpoint path for model saving are carefully
pipeline is designed to be flexible, allowing for the option set. The dataset is then loaded using custom data loaders,
to enable or disable augmentation based on the “augment” and the training and validation sets are meticulously
parameter. This flexibility caters to different experimental prepared from augmented retinal fundus images along
Volume 1 Issue 4 (2024) 35 doi:10.36922/aih.2783

