Page 74 - AIH-2-3
P. 74

Artificial Intelligence in Health                                        CNN model for leukemia diagnosis



            2.2.1. C-NMC dataset                               3. Proposed methodology

            The  C-NMC  (The  Cancer  Genome  Atlas)  dataset  is  a   3.1. CNN model architecture
            crucial resource for developing and evaluating DL models
            for leukemia classification. This section outlines the specific   This paper presents a multilevel  image classification
            methodology employed to leverage the C-NMC dataset for   method using DL for leukemia datasets. The proposed
            multilevel image classification using DL techniques. 22  CNN  model (customized  CNN model optimized by
                                                               Tversky loss function) with multiple convolutional and
            2.2.2. Dataset description                         dense  layers  optimized  with  a  Tversky  loss  function
                                                               achieves high accuracy and robustness, demonstrating
            The C-NMC dataset comprises a comprehensive collection   its  potential for  aiding  in  the early  diagnosis of
            of blood smear images, annotated with labels indicating   leukemia. Customized CNN is used for handling
            various types of leukemia, including ALL and AML.   imbalanced datasets (i.e., different proportions of healthy
            The dataset provides a robust foundation for training   vs. malignant cells).
            and testing DL models aimed at automating leukemia
            diagnosis.                                           The proposed model utilizes a CNN, specifically
                                                               designed and optimized for multilevel classification tasks
              The C-NMC dataset as shown in Figure 3 contains a
            total of 10,000 images, evenly divided between healthy   on the C-NMC leukemia dataset. The architecture consists
                                                               of several convolutional layers to extract hierarchical
            and malignant cells. Each image has associated metadata,   features from the input images followed by pooling layers
            including patient ID, sample ID, age, gender, diagnosis,   to reduce dimensionality. Batch normalization is applied
            and slide details. In addition, the dataset is supposed to be   to enhance the model’s stability, while rectified linear unit
            split into a training set with 8,000 images and a testing set   (ReLU) activation functions are employed to introduce
            with 2,000 images.                                 non-linearity.
            •   Total images: 10,000
            •   Healthy cells: 5,000                             After the convolutional layers, the network includes
            •   Malignant cells: 5,000                         fully connected (dense) layers that perform classification
            •   Metadata entries: 10,000 (one for each image)  tasks. The architecture is optimized using the Tversky
            •   Training set: 8,000 images                     loss function, which is particularly effective for handling
            •   Testing set: 2,000 images                      imbalanced datasets like those found in medical image
              Understanding the number of entries in the C-NMC   analysis. The final output layer uses softmax activation for
            dataset helps researchers and practitioners gauge the   multiclass classification, differentiating between various
            dataset’s size, diversity, and suitability for training   subtypes of leukemia.
            and testing ML models for the classification of bone   For leukemia classification, the proposed DL
            marrow cells. It provides insight into the dataset’s   architecture illustrated in the Figure 4 can be adapted to
            comprehensiveness and potential for developing robust   identify and classify leukemia subtypes based on specific
            and accurate algorithms for medical image analysis.  input  data,  such  as microscopic  blood  smear  images  or























            Figure 3. The C-NMC dataset description includes the following details: (A) dataset composition by image type, including healthy and malignant cells, and
            (B) training and testing data with associated metadata details.


            Volume 2 Issue 3 (2025)                         68                               doi: 10.36922/aih.4710
   69   70   71   72   73   74   75   76   77   78   79