Page 41 - AIH-1-4
P. 41

Artificial Intelligence in Health                             Segmentation and classification of DR using CNN



            corresponding masks, along with calculating the total   setups, enabling researchers to assess the impact of data
            number of samples. The “getitem” method is responsible   augmentation on model performance. The implementation
            for  reading  and  pre-processing  each  sample,  where  the   adheres to best practices in data augmentation for medical
            fundus image undergoes normalization and transposition   image analysis, contributing to the overall reliability and
            to align its dimensions appropriately for the subsequent   generalization of the U-Net model for retinal fundus image
                      18
            U-Net input.  Simultaneously, the binary mask is read   segmentation tasks. 21
            and expanded to accommodate the model’s requirements.
            Both the pre-processed image and mask are converted to   2.3. Network architecture
            PyTorch tensors before being returned as a tuple. 27  The neural network architecture presented in the code
              This dataset class provides  a seamless interface for   is a U-Net, a popular architecture widely used in image
            interacting with the DRIVE dataset, offering a standardized   segmentation tasks. The U-Net architecture consists of an
            and efficient means of loading and preparing data for   encoder-decoder structure with skip connections, allowing
            training and evaluation. The “len” method ensures that the   the model to capture both high-level and low-level features
            total number of samples can be easily accessed, facilitating   effectively. The encoder portion of the network employs
                                                         28
            iterative processes during model training and validation.    convolutional blocks to extract hierarchical features
                                                               from  the  input  image.   Specifically,  it  comprises  four
                                                                                  30
            Overall, the data pre-processing workflow is encapsulated   encoder blocks, each consisting of two convolutional
            within this dataset class, contributing to the robustness   layers with batch normalization and rectified linear unit
            and adaptability of the U-Net architecture for semantic
            segmentation tasks on retinal fundus images. 29    activation functions, followed by max-pooling layers for
                                                               downsampling, as presented in Figure 4. 10
            2.2. Data augmentation                               The bottleneck layer acts as a feature representation
            The data augmentation process plays a pivotal role in   for the entire input image, condensing the learned
                                                                      34
            enhancing the robustness and diversity of the dataset used   features.  It consists of a convolutional block with the
            for training machine learning models, particularly in the   same structure as the encoder blocks. The decoder
            domain of medical image segmentation. In the provided   portion of the network utilizes transposed convolutions
            code, a comprehensive data augmentation pipeline is   for up-sampling and concatenates the features from the
                                                                                                            28
            implemented to augment retinal fundus images and their   corresponding encoder block through skip connections.
            corresponding masks. The primary goal is to introduce   This enables the decoder to recover spatial information
            variability in the dataset by applying horizontal flips,   lost during the down-sampling process. The decoder
            vertical flips, and rotation to the original images and masks.   also incorporates convolutional blocks for feature
            The augmentation  process  aims to simulate different   refinement. 11
            orientations and perspectives that may be encountered in   The classifier at the end of the network is a 1 × 1
            real-world scenarios, thereby enriching the dataset and   convolutional layer, mapping the features to a single
            improving the generalization capability of the subsequent   channel output, which is suitable for binary segmentation
            U-Net model. 9                                     tasks.  The entire architecture is designed for semantic
                                                                   24
              The “augment_data” function iterates through the   segmentation, particularly for tasks where precise
            training dataset, applying various augmentations to each   delineation of object boundaries is crucial. In summary,
            image-mask pair. The augmentations include horizontal   this U-Net architecture facilitates robust feature extraction,
            flips, vertical flips, and rotations, with the associated masks   effective information fusion through skip connections, and
            adjusted accordingly. The resulting augmented images and   accurate segmentation outputs. 3
            masks are resized to a standardized dimension of (512 × 512).   2.4. Training process
            The augmented data is then saved in a separate directory
            structure, creating distinct folders for augmented images   2.4.1. Pre-training
            and masks. The use of data augmentation is particularly   In the pre-training phase, the U-Net model is meticulously
            valuable when the dataset size is limited, as it introduces   configured  with  various  hyperparameters  to  ensure
            diversity  that  aids  in  preventing  overfitting  and  improves   optimal performance. The image dimensions (H [height]
            the model’s ability to handle variations in real-world data.
                                                               × W [weight]), batch size, number of epochs, learning
              It  is  important  to  note  that the data augmentation   rate, and checkpoint path for model saving are carefully
            pipeline is designed to be flexible, allowing for the option   set. The dataset is then loaded using custom data loaders,
            to enable or disable augmentation based on the “augment”   and the training and validation sets are meticulously
            parameter. This flexibility caters to different experimental   prepared  from  augmented retinal fundus  images  along


            Volume 1 Issue 4 (2024)                         35                               doi:10.36922/aih.2783
   36   37   38   39   40   41   42   43   44   45   46