Page 121 - DP-2-3
P. 121
Design+ ML for predicting Alzheimer’s progression
Table 1. Dataset summary modeling, evaluation, and deployment. Figure 1 illustrates
a graphic representation of these CRISP-DM phases.
Variable Description
Demographics 4.1. Business understanding
Age 55–96 years old The business understanding phase involves defining
Gender Categorized as “Female” or business objectives, assessing the current context,
“Male”
Medical history establishing data mining goals, and formulating a project
Psychiatric (MH_PSYCH) Binary features plan. As outlined in the introduction, a background
Neurologic (MH_NEURL) study was conducted, and the research objectives were
Cardiovascular (MH_CARD) clearly defined. The success criteria for this study involved
Hepatic (MH_HEPAT) benchmarking classifier performance against the AD
6
Musculoskeletal (MH_MUSCL) classification model presented by Rahman and Prasad
Endocrine–metabolic (MH_ENDO) and comparing the best diagnosis classifier with the one
Gastrointestinal (MH_GAST) identified in their study.
Renal–genitourinary (MH_RENA) This comparison focused on four key metrics critical for
Smoking (MH_SMOK) evaluating classifier performance: (i) Accuracy, indicating
Malignancy (MH_MALI) the proportion of correctly predicted instances relative to
ApoE genotype the total number of instances in the dataset; (ii) precision, a
Two-allele genotype Each individual carries two measure of prediction reliability, reflecting the ratio of true
ApoE alleles, and each allele positive predictions to all positive predictions; (iii) recall,
can be E2, E3, or E4
Neuropsychological assessments also referred to as sensitivity, measuring the classifier’s
Clinical dementia rating Total number of story units ability to identify actual positive cases; and (iv) F1-score,
(CDGLOBAL) recalled immediately; scores the harmonic mean of recall and precision, which balances
ranged from 0 to 25 the trade-off between these two metrics. 8
Mini-mental state exam Total number of story units A comprehensive project plan was formulated
(MMSCORE) recalled after a delay; scores
ranged from 0 to 25 based on available resources, requirements, and risk
Logical memory immediate recall - assessments. The plan encompassed tasks across each
(LIMMTOTAL) CRISP-DM phase, including the selection of appropriate
Logical memory delayed recall tools, methodologies, and risk mitigation strategies. The
(LDELTOTAL) primary tools utilized were Google Colab and Python,
Blood analysis with tasks involving data preparation, cleaning, and
Thyroid stimulating hormone analysis. Python libraries, particularly functionalities
(AXT117)
Vitamin B12 (BAT126)
Red blood cell count (HMT3)
White blood cell count (HMT7)
Platelet count (HMT13)
Hemoglobin (HMT40)
Mean corpuscular hemoglobin
(HMT100)
Mean corpuscular hemoglobin
concentration (HMT102)
Urea nitrogen (RCT6)
Serum glucose (RCT11)
Cholesterol (high performance;
RCT120)
Creatinine (rate blanked; RCT329)
Diagnosis
Diagnostic results Categorized into healthy
control, mild cognitive
impairment, and Alzheimer’s
disease
Abbreviation: ApoE: Apolipoprotein E. Figure 1. Phases of the cross-industry process for data mining
Volume 2 Issue 3 (2025) 3 doi: 10.36922/DP025270031

