Page 45 - IJAMD-1-3

P. 45

International Journal of AI for
Materials and Design
Metal AM porosity prediction using ML

Algorithm 1: class in set N yˆ do not match, then it removes d from D.
I
i
SMOTE ENN repeats the process until a desired proportion of each
class is fulfilled. The resultant reduced dataset after the
Data: S: Original Dataset procedure is T (Algorithm 2).
Result: D: Modified Dataset
/* θ =user-defined k nearest neighbor to consider (by default 3) */ 2.3.2. SMOTE for regression
k
k ← θ
k
SMOTE is a sampling strategy to handle datasets with
/* iterate the algorithm until the desired class balance in dataset S is imbalanced class distribution for classification tasks.
achieved */ SMOTE for Regression (SMOTER) is an adaptation
37
while notBalanced (S) do
of SMOTE for regression tasks where the objective is to
/* get a random sample from the minority class in set predict rare extreme values, which are more critical herein.
S */ Like SMOTE, SMOTER oversamples the observations
s ←random (S)
r with rare extreme values and undersamples the remaining
normal observations, balancing the distribution of the
/* get s ’s k nearest neighbors in S */N←nearestNeighbors (s )
r r
values. A user-defined threshold value determines the
/* get a random neighbor in set N */n ←random (N) distinction between rare and normal values. The algorithm
r
also accepts parameters that govern the percentages of
/* create a sample at a random location along the line segment desired under- and over-sampling and the k-NNs to
connecting s and n in the feature-space */
r
r
s←createSample (s , n ) consider generating the new synthetic samples.
r r
Traditional ML algorithms, such as DTs and Naïve
/* add the minority class sample to set S */
S←S ∪{s} Bayes, work on datasets with feature-vector representation
end (i.e., a single vector describing a feature). A time series is
a series of data points taken sequentially in time. When it
D←S comes to features described by time series, models like DTs
fails to work. These models treat each data point in the series
independently and hence miss the opportunity of capturing
Algorithm 2: the series’s sequential information. ML algorithms working
with time series can take two different approaches. Either
Edited Nearest Neighbor special ML time-series models are employed (like k-NN
with Dynamic Time Warping, Time-Series-Forest, and
Data: D: Modified dataset after SMOTE
Result: T: Reduced set BOSS), or the time series itself is transformed into a
/* θ = user-defined k nearest neighbor to consider (by default 3) */ set of features that describe the sequence on which the
k
k ← θ k traditional ML models are then optimized. Deep Learning
while notProportionate (D) do (DL) does exhibit high performance for such tasks and
/* iterate through all samples in D */for d i
∈ D do provides a flexible framework to model sequential data;
however, in some contexts, the lack of big data currently
/* get d’s k nearest neighbors in D */ available in AM restricts DL’s application. Hence, this paper
i
N ← nearestNeighbors (d, D, k) investigates applying the traditional ML algorithm to time-
i
series data without losing sequential information. For this
/* get d’s class */
i
y = getClass (d) i purpose, we explored by using summary tools capable of
i
extracting several features characterizing the time series.
/* get majority class in the set N */ The summary vectors can then be employed in a standard
yˆ = getMajorityClass (N) ML algorithm for classification and regression tasks.
i
/* if y and yˆ do not match, remove In the following subsections, we describe the summary
i
i
sample d from tool (TS-Fresh package) and the ML models employed in
i
D */ this study.
if y ̸= yˆ then
i
i
D ← D \{di} 2.3.3. Time series feature extraction on the basis of
end end
end scalable hypothesis tests
T ← D The wide availability of cheaper sensors allowed their usage
in various domains such as manufacturing, health, and
Volume 1 Issue 3 (2024) 39 doi: 10.36922/ijamd.4812

40 41 42 43 44 45 46 47 48 49 50