Page 69 - AIH-2-2
P. 69
Artificial Intelligence in Health Improved liver tumor segmentation with dense networks
Figure 1. Overview of our proposed liver tumor segmentation pipeline. The model takes three consecutive slices as input to predict the middle one. In the
training phase, training samples with dimensions of 224 × 224 × 3 are cropped randomly from raw CT volumes for data augmentation, and then fed into
the model. In the testing phase, the trained model processes a 3D CT volume by taking three adjacent slices in their original size as input and sliding along
the z-axis of the CT volume at a step size of 1. The segmentation of the entire 3D volume is completed in this way. Figure created by the authors.
Abbreviations: CT: computed tomography; FCN: fully convolutional network; n: total number of images captured in a CT scan.
neural networks. Recently, a novel connectivity pattern,
called dense connections, has been designed in DenseNets
32
to further improve the information flow between layers,
yielding state-of-the-art classification results across several
datasets. Considering the superior feature extraction ability
Figure 2. The CT image pre-processing module of DenseNets, we employ the DenseNet-161 classification
Abbreviations: CT: computed tomography; Conv: convolutional layer.
32
network as the encoder, excluding the global average
pooling, fully connected, and softmax layer. There are four
2.2. I -DenseFCN segmentation network dense blocks in the DenseNet-161 network and the dense
2
The proposed I -DenseFCN segmentation network is connections in each block are referred to as intra-block dense
2
built on the classical encoder-decoder architecture as connections. In this design, direct connections are introduced
illustrated in Figure 3. The encoder is derived from the from every layer within a dense block to all subsequent
DenseNet classification network. The decoder introduces layers. Each layer, therefore, receives the concatenated
32
dense connections among upsampling blocks of different feature maps of all preceding layers and produces k feature
levels. The UNet-like long skip connections are established maps through a composite function comprising three
between the encoder and decoder. consecutive operations: Batch normalization, a rectified
linear unit, and a 3 × 3 convolution. k is referred to as the
2.2.1. Encoder growth rate, indicating the amount by which each module’s
The representation ability of features largely affects the output increases relative to its input. The transition down
performance of semantic image segmentation. As deeper blocks, consisting of a 1×1 convolution followed by a 2 × 2
33
networks generally provide stronger feature representation pooling operation, are introduced between dense blocks to
capabilities, the network structures for feature extraction reduce the spatial dimensionality of feature maps.
32
continue to deepen. However, deep neural networks are more
difficult to train if their depth is attained by simply stacking 2.2.2. Decoder
the layers. Mitigating this issue, ResNets introduces the Considering the existence of multiscale tumors – one of
20
residual connections to facilitate the training of very deep the primary challenges in the liver tumor segmentation
Volume 2 Issue 2 (2025) 63 doi: 10.36922/aih.5001

