Page 79 - AIH-2-3
P. 79

Artificial Intelligence in Health                                        CNN model for leukemia diagnosis




                         A                                 B                     C



















            Figure  7. Performance comparison of different deep learning algorithms. (A) Accuracy comparison across CNN, LSTM, and Transformer models.
            (B) Precision comparison of the same models. (C) Recall comparison of the same models.
            Abbreviations: CNN: Convolutional neural network; LSTM: Long short-term memory; RNN: Recurrent neural network.

                                                               slightly after epoch 15, the validation accuracy plateaus,
                                                               suggesting that the model reaches its generalization
                                                               capacity around this point. The consistent gap between
                                                               training and validation accuracy suggests some level of
                                                               overfitting, though the model still generalizes relatively
                                                               well to unseen data.
                                                                 The significant difference between the training loss and
                                                               validation loss, especially toward the later epochs, indicates
                                                               potential overfitting. The model performs very well on the
                                                               training data (very low training loss) but not as well on the
            Figure 8. The trends of training and validation losses of CNN + Tversky   validation data, suggesting it may have learned the training
            loss on C-NMC dataset over epochs.                 data too specifically and not generalized as well to new,
                                                               unseen data.
            line, and the validation loss, depicted by the orange line,   While the model is effectively minimizing training loss,
            both show a consistent downward trend, indicating that   it is important to address the gap between training and
            the model is effectively learning and generalizing from the   validation loss to ensure better generalization. Techniques
            data.                                              such as regularization, dropout, or early stopping could be
              Initially, the losses are relatively high, around 0.7, but   considered to mitigate overfitting and improve validation
            both decrease steadily with training, converging toward   performance.
            approximately 0.1 by the 30  epoch. This suggests that the
                                  th
            model’s performance improves with more training epochs,   4.3. Challenges and future directions
            and there is no significant overfitting, as evidenced by the   Despite the promising results, several challenges remain in
            parallel decrease in both training and validation losses.   the application of DL to leukemia classification. One major
            Figure  9 shows the training and validation accuracy of   challenge is the variability in image quality and staining
            a CNN using the Tversky loss function on the C-NMC   techniques across different datasets, which can affect
            dataset over 30 epochs. The training accuracy, represented   model performance.
            by the yellow line, and the validation accuracy, depicted   Figure  10 compares the performance of different DL
            by the orange line, both demonstrate a substantial increase   optimizers (Adagrad, Adam, RMSprop, SGD) in terms
            initially, indicating rapid learning.              of accuracy, precision, and recall. The Adam optimizer
              Training  accuracy  starts  around  70%  and  rises  to   demonstrates the highest performance across metrics
            approximately 97%, while validation accuracy starts at the   such as accuracy, precision, and recall, followed closely by
            same point but peaks at around 92%. The graph indicates   RMSprop. In contrast, Adagrad and SGD exhibit similar
            that while the training accuracy continues to improve   performance, which is slightly lower than both Adam and



            Volume 2 Issue 3 (2025)                         73                               doi: 10.36922/aih.4710
   74   75   76   77   78   79   80   81   82   83   84