Page 95 - AIH-2-2
P. 95
Artificial Intelligence in Health Cirrhosis prediction in hepatitis C
2. Data and methods 2.4. Predictor variables
2.1. Data source Predictor variables for predicting cirrhosis development
were selected based on our previous research and
The national VHA system is the largest integrated biological plausibility. We employed both baseline and
healthcare system in the United States. It includes 172 longitudinal variables for our analysis. The baseline
medical centers and 1,069 outpatient sites of care, serving predictors consisted of age at the enrollment, gender, race,
9 million enrollees. All data were obtained from the VA and HCV genotype. The longitudinal predictors, which
9
Corporate Data Warehouse, which is a comprehensive may be assessed multiple times, included achievement of
repository of data from the VA’s universal electronic SVR, body mass index, and 24 laboratory blood tests. The
medical record system including laboratory data, biometric achievement of SVR was defined as a serum HCV RNA
data, diagnoses, and pharmacy data. 10
viral load below the lower limit of detection performed
All study procedures were approved by the VA Ann at least 12 weeks after the end of HCV treatment, where
Arbor Institutional Review Board. All procedures conform we identified all antiviral treatment regimens received,
to the ethical guidelines of the 1975 Declaration of Helsinki. including both interferon and direct-acting antiviral-
A waiver of informed patient consent was obtained before based therapies. The blood tests used in this study included
project initiation. total bilirubin, aspartate aminotransferase (AST), alanine
aminotransferase (ALT), alpha-fetoprotein (AFP), alkaline
2.2. Study population phosphatase (ALP), albumin, AST:ALT ratio, FIB-4, APRI,
We identified 182,747 VHA users with a history of blood urea nitrogen, creatinine, glucose, international
positive HCV RNA tests seen in the VHA at least once normalized ratio (INR), hemoglobin, leukocyte count,
between January 2000 and January 2016. Patients were platelet count (PLT), sodium, potassium, chloride, and
followed from the date of the first APRI (enrollment) to total protein. FIB-4 and APRI scores were defined using
12
their last visit recorded in the VA system through January published formulae to assess the degree of liver fibrosis.
2019. To ensure that patients did not have cirrhosis at In addition, the laboratory values of AST, ALT, AFP, and
enrollment, we included only patients with APRI results ALP, which were measured through standardized blood
<2.0 (72% negative predictive value for cirrhosis in CHC) tests, were divided by the corresponding upper limits of
at enrollment. Because antiviral treatment outcome is a normal to account for differences in reference ranges
11
key predictor of cirrhosis development, we excluded an across laboratories.
additional 13,430 patients who received antiviral treatment
regimens but lacked RNA tests in VHA electronic records 2.5. Cohort building
to document whether sustained virologic response (SVR) Labeled patients were followed from enrollment (time 0)
was achieved. After these exclusions, the cohort contained to the date of the last available TE or the date of diagnosis
169,317 patients, among which 10,575 patients had of cirrhosis through TE, if applicable. Unlabeled patients
undergone TE after enrollment. Finally, since we aimed to (i.e., those without TE outcomes) were followed from
develop longitudinal models predicting the development enrollment to the last visit documented in the VHA records
of cirrhosis over a 1-year period, we excluded TE results for (Figure 1). The training cohort was created by randomly
297 patients who had less than 1 year of available follow-up selecting visit dates from the patient’s follow-up records.
time between enrollment and their last available TE. This This approach simulates the scenario in which we aim to
resulted in a final analytic cohort of 10,278 patients with predict the risk of cirrhosis within a year of a clinical visit
valid TE results (the “labeled cohort”) for 1-year prediction based on a patient’s medical history.
and a cohort of 159,039 patients without TE results (the
“unlabeled cohort”). 2.5.1. Labeled cohort for supervised learning
All patients with known cirrhosis outcomes by TE were
2.3. Progression to cirrhosis defined by TE
included in this cohort (Figure 1). The models used
TE was introduced into the VHA system in 2013 for baseline predictors as well as the entire trajectory of the
the non-invasive assessment of fibrosis and can be longitudinal predictors from enrollment to their sampled
considered a reliable measure for cirrhosis outcome. visit time t. The outcome measured whether the patient
Our primary outcome, the development of cirrhosis, developed cirrhosis within 1 year, starting from time t.
was defined based on liver stiffness >12.5 kPa on TE
measured at least once in the VHA data. The earliest date 2.5.1.1. Cases
of liver stiffness >12.5 kPa on available TEs is defined as There were 2,247 patients in the labeled cohort who
the date of cirrhosis. developed cirrhosis during follow-up according to their
Volume 2 Issue 2 (2025) 89 doi: 10.36922/aih.4671

