Page 69 - AIH-1-4

P. 69

Artificial Intelligence in Health ML models for heartbeat classification

• Evaluate the scaling of the ML models with varying The adopted dataset consists of ECG recordings from
dataset sizes, emphasizing the manner in which 205 heartbeat signals. To protect the personal information
preprocessing methods and algorithm adaptations help of the patients, appropriate measures are undertaken to
maintain efficiency and effectiveness across different ensure fairness in the evaluation process, which utilizes
data volumes. a training set comprising 80,000 samples for model

The organization of this paper, illustrated in Figure 1, is construction and validation and a test set comprising
as follows: Section 1 introduces the topic, explaining ECG 20,000 samples. The primary task is to predict the ECG
classification using various models. Section 2 outlines heartbeat signal category of the dataset signals, which are
the data and methods used, covering all aspects from provided by a platform that records ECG data by capturing
dataset collection to signal classification. Section 3 focuses only one column of the heartbeat signal sequence. Each
on interpreting the results, and Section 4 discusses the sample within this sequence is sampled at the same
proposed approach. frequency and is of equal length to ensure consistency
across the dataset. Annotations in this dataset are used to
2. Data and methods create four different beat categories, and this categorization
follows the standards set by the Association for the
2.1. ECG datasets Advancement of Medical Instrumentation EC57. Table 1
18
Previous studies have achieved promising results in summarizes the mappings between beat annotations in
classifying heartbeat segments based on arrhythmia classes each category.
using the MIT-BIH Arrhythmia Database. 14-16 However, class
imbalance has remained a notable issue in electronic health 2.2. Data preprocessing
(eHealth), where abnormal samples are much fewer than This section details the N, S, V, and F categories used in
normal ones. This imbalance can bias the model toward the this study (Table 1). The training and test set distributions
dominant class, leading to the poor or average classification are illustrated in Figures 2 and 3, respectively, which
of the minority class, which negatively impacts classification depict the class imbalance phenomenon. On training
accuracy and other performance metrics. Instead of the with different ML models, class weights are assigned to
17
MIT-BIH Arrhythmia Database, which is widely known for address this class imbalance. Figure 4 presents a normal
ECG classifications, this study uses a dataset provided by and an abnormal heartbeat, with its x-axis denoting the
the University of Chinese Academy of Sciences, which is time frame ranging from 0.0 ms to 1.6 ms and its y-axis
available on request. This dataset includes four categories: representing the normalized amplitudes of heartbeat
Normal (N), supraventricular (S), ventricular ectopic (V), signals. The methodology section further describes the
and fusion (F), as indicated in Table 1. associated methods.

Figure 1. Organization of the paper

Volume 1 Issue 4 (2024) 63 doi: 10.36922/aih.3543

64 65 66 67 68 69 70 71 72 73 74