Page 98 - AIH-2-2
P. 98

Artificial Intelligence in Health                                         Cirrhosis prediction in hepatitis C



























            Figure 2. Model architecture for RNN and semi-RNN
            Abbreviations: BMI: Body mass index; GRU: Gated recurrent units; RNN: Supervised recurrent neural network; Semi-RNN: Semi-supervised recurrent
            neural network.

            carried out  for  10  times,  and  the  resulting performance   models, given the computation cost of training deep
            characteristics on the testing sets were averaged over the   learning models, we selected the optimal hyperparameters
            10 splits to examine each method.                  as those achieving the highest AuROC on the hold-out
              Due to the irregularity in the number and timing of   validation set. We randomly generated 100 combinations
            clinic visits across patients, and the incomplete availability   of hyperparameters, searching the optimal hidden sizes of
            of certain predictors at each visit, we employed imputation   model structures within the range of [16, 128], batch size
            techniques  to  address  missing  data.  For  the  LR  and  RF   from the set {64, 128, 256, 512}, dropout rates in the range
            models, we first calculated summary statistics for the   of [0.2, 0.5], learning rates of Adam optimizer within the
            longitudinal predictors. We then applied the missForest   range of [0.0001, 0.01], and weights for semi-supervised
            algorithm,  which efficiently handles multivariate data   learning within the range of [0.001, 1], where applicable. We
                    18
                                                                                                        21
            comprising  both  continuous  and  categorical  variables   implemented both RNN models using PyTorch 1.12.1  on a
            without relying on distributional assumptions or requiring   high-performance computing cluster.
            extensive parameter tuning. For the RNN and semi-RNN   We evaluated the models’ ability to distinguish if the
            models, we first imputed missing entries of longitudinal   patient  developed  cirrhosis  within  1  year  by  measuring
            predictors at time 0 by the mean of non-missing entries   their  performance  characteristics  using  AuROC  and
            at enrollment in the training set. We then filled in the   the area under the precision-recall curve (AuPRC).  To
                                                                                                         19
            remaining  missing  entries  with  the  latest  non-missing   compare the overall accuracy of the models, we used the
            value before the respective time points. Once missing data   Brier score  where a score of 0 indicates perfect accuracy.
                                                                        22
            were imputed, we standardized all the data using the mean   Furthermore, we compared the performance characteristics
            and standard deviation of the corresponding training set.  of the conventional models and RNN models using a

              We tuned hyperparameters for each method as follows:   paired sample t-test. The two-sided p-values were reported
            For two conventional machine learning methods, we   to assess the statistical significance.
            combined the training and validation sets to conduct 10-fold   Given  that  machine  learning  algorithms  typically
            cross-validation and selected the optimal hyperparameters   require large datasets to achieve optimal performance,
            as those achieving the highest area under the receiver   we evaluated the robustness of four models by analyzing
            operating characteristic curve (AuROC).  We adopted the   their performance when only a limited amount of labeled
                                            19
            penalized LR given the collinearity among longitudinal   data are available. To achieve this, we reduced the amount
            summary statistics and tuned the regularization strength for   of labeled training and validation data to 50%, 20%, 10%,
            the LR model. For the RF model, we searched the optimal   and 5% of the original sets, and repeated the training,
            number of trees and the number of features used for each   validation, and testing procedures as above. We plotted
            split. Both models were implemented using the Scikit-learn   the changes in three performance characteristics for each
            library in Python version 3.9.7.  For the RNN and semi-RNN   model at each percentage level.
                                   20


            Volume 2 Issue 2 (2025)                         92                               doi: 10.36922/aih.4671
   93   94   95   96   97   98   99   100   101   102   103