Page 94 - AIH-2-2
P. 94
Artificial Intelligence in Health Cirrhosis prediction in hepatitis C
9 Center for Clinical Management Research, VA Ann Arbor Healthcare System, Ann Arbor, Michigan, United States of America
10 Department of Statistics, College of Literature, Science, and the Arts, University of Michigan, Ann Arbor, Michigan, United States of America
11 Center for Global Health and Equity, University of Michigan, Ann Arbor, Michigan, United States of America
12 Department of Learning Health Sciences, Michigan Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
1. Introduction incorporating both baseline predictors and summary
statistics of longitudinal predictors, with RF being able to
Chronic hepatitis C (CHC) is a leading cause of cirrhosis, capture non-linear trends that are limitedly represented
liver transplantation, and hepatocellular carcinoma. The by LR. Recent advances in deep learning, a subtype of
7
development of cirrhosis in patients with CHC is highly machine learning, see the emergence of the deep recurrent
variable and non-linear. Many cofactors influence the neural network (RNN) as a powerful tool to process
1
risk of cirrhosis in patients with CHC, including patient sequential data collected at various times. The structure
8
characteristics (e.g., alcohol use, obesity, and age), viral of RNN has shown superior performance for applications
factors (e.g., genotype), successful antiviral treatment, such as machine translation, and it is flexible to be applied
and others. Reliable models to predict cirrhosis risk are in both supervised learning and semi-supervised learning.
2
needed to facilitate population-level screening and guide In supervised learning, we use training data with known
treatment decision-making. Machine learning methods outcomes (“labeled data”) to learn an algorithm that
to predict cirrhosis development offer greater flexibility can make accurate predictions for new unseen data. In
than traditional predictive methods because they can contrast, semi-supervised learning uses both labeled data
accommodate large numbers of predictor variables and data with missing outcomes (“unlabeled data”), where
and are able to handle data with complex inter-variable the unlabeled data can help identify relevant patterns.
relationships and irregular collection intervals. Semi-supervised learning can help improve prediction
Traditional methods for predicting cirrhosis are subject performance especially in the case where labeled data is
to several limitations. Liver biopsy, which is considered the scarce. Both supervised RNN and semi-supervised RNN
gold standard for diagnosis, is invasive and poorly scalable (semi-RNN) offer advantages over conventional methods,
for large populations. Non-invasive markers of liver such as LR because they can handle varying time durations
disease, such as the AST-to-platelet ratio index (APRI), and irregular time gaps between two consecutive visits,
the fibrosis-4 index (FIB-4), transient elastography and they can automatically learn predictive patterns
(TE), and others, offer only single snapshots in time from raw data, rather than requiring pre-specified feature
and do not account for longitudinal changes. Fibrosis extraction.
assessment is not performed across large populations Machine learning methods have previously demonstrated
at consistent intervals or in all patients. Moreover, few superior performance compared to linear Cox proportional
previous models examining patients with CHC evaluate hazards in predicting the risk of cirrhosis in patients with
the risk of continued liver disease progression after CHC, as defined based on APRI score thresholds. However,
4
antiviral therapy (e.g., due to comorbid liver disease that it remains unknown whether machine learning methods
persists after CHC eradication). In addition, the variable perform well in predicting progression to cirrhosis, as
rate of fibrosis progression over time complicates the defined by TE, a far more sensitive modality for assessing
development of reliable risk prediction models using liver fibrosis. Although TE has become more widely
conventional methods. used in the last decade, it is still used in only a minority
Machine learning is a form of artificial intelligence of patients with CHC. Therefore, we hypothesized that
that uses computer algorithms to identify patterns in outcomes from patients who underwent TE could be used
large datasets and allows computers to assimilate new to train a model to accurately predict the 1-year risk of
information without being explicitly programmed. It developing cirrhosis in CHC patients, as defined by TE.
3
has demonstrated success in many real-world healthcare The Veterans Health Administration (VHA) serves the
applications, including computer-aided interpretation of largest single cohort of CHC patients in the United States.
liver imaging, prediction of hepatocellular carcinoma risk, Our analysis aimed to evaluate the predictive performance
4-6
and prediction of cirrhosis development. For example, of deep learning methods and compare it to conventional
conventional machine learning models such as logistic models. Moreover, we aimed to assess whether semi-RNN
regression (LR) and random forest (RF) have been shown can obtain better performance than a supervised RNN
to effectively predict the disease progression in CHC by when the number of patients with TE outcomes is limited.
Volume 2 Issue 2 (2025) 88 doi: 10.36922/aih.4671

