Page 45 - IJAMD-1-3
P. 45

International Journal of AI for
            Materials and Design
                                                                                  Metal AM porosity prediction using ML



            Algorithm 1:                                       class in set N yˆ  do not match, then it removes d from D.
                                                                            I
                                                                                                      i
            SMOTE                                              ENN repeats the process until a desired proportion of each
                                                               class is fulfilled. The resultant reduced dataset after the
            Data: S: Original Dataset                          procedure is T (Algorithm 2).
            Result: D: Modified Dataset
             /* θ =user-defined k nearest neighbor to consider (by default 3) */  2.3.2. SMOTE for regression
                k
             k ← θ
                 k
                                                               SMOTE  is  a  sampling  strategy  to  handle  datasets  with
             /* iterate the algorithm until the desired class balance in dataset S is   imbalanced class distribution for classification tasks.
             achieved */                                       SMOTE  for  Regression  (SMOTER)  is an adaptation
                                                                                             37
             while notBalanced (S) do
                                                               of SMOTE for regression tasks where the objective is to
                       /* get a random sample from the minority class in set  predict rare extreme values, which are more critical herein.
                      S */                                     Like  SMOTE,  SMOTER  oversamples  the  observations
                      s ←random (S)
                 r                                             with rare extreme values and undersamples the remaining
                                                               normal observations, balancing the distribution of the
                       /* get s ’s k nearest neighbors in S */N←nearestNeighbors (s )
                     r                                r
                                                               values.  A  user-defined  threshold  value  determines  the
                       /* get a random neighbor in set N */n ←random (N)  distinction between rare and normal values. The algorithm
                                         r
                                                               also  accepts  parameters  that  govern  the  percentages  of
                       /* create a sample at a random location along the line segment   desired  under-  and over-sampling and the  k-NNs to
             connecting s  and n  in the feature-space */
                     r
                          r
                       s←createSample (s , n )                 consider generating the new synthetic samples.
                            r  r
                                                                 Traditional ML algorithms, such as DTs and Naïve
                        /* add the minority class sample to set S */
                        S←S ∪{s}                               Bayes, work on datasets with feature-vector representation
                    end                                        (i.e., a single vector describing a feature). A time series is
                                                               a series of data points taken sequentially in time. When it
            D←S                                                comes to features described by time series, models like DTs
                                                               fails to work. These models treat each data point in the series
                                                               independently and hence miss the opportunity of capturing
            Algorithm 2:                                       the series’s sequential information. ML algorithms working
                                                               with time series can take two different approaches. Either
            Edited Nearest Neighbor                            special ML time-series models are employed (like k-NN
                                                               with Dynamic Time Warping, Time-Series-Forest, and
            Data: D: Modified dataset after SMOTE
            Result: T: Reduced set                             BOSS),  or  the  time  series  itself  is  transformed  into  a
            /* θ  = user-defined k nearest neighbor to consider (by default 3) */  set  of  features  that  describe  the  sequence  on  which  the
               k
            k ← θ k                                            traditional ML models are then optimized. Deep Learning
            while notProportionate (D) do                      (DL) does exhibit high performance for such tasks and
                                                        /* iterate through all samples in D */for d  i
                                               ∈ D do          provides a flexible framework to model sequential data;
                                                               however, in some contexts, the lack of big data currently
                                                              /* get d’s k nearest neighbors in D */  available in AM restricts DL’s application. Hence, this paper
                                    i
                                                             N ← nearestNeighbors (d, D, k)  investigates applying the traditional ML algorithm to time-
                                              i
                                                               series data without losing sequential information. For this
                                                             /* get d’s class */
                                    i
                                                             y  = getClass (d) i  purpose, we explored by using summary tools capable of
                                i
                                                               extracting several features characterizing the time series.
                                                             /* get majority class in the set N */  The summary vectors can then be employed in a standard
                                                             yˆ  = getMajorityClass (N)  ML algorithm for classification and regression tasks.
                                i
                                                              /* if y  and yˆ  do not match, remove   In the following subsections, we describe the summary
                                   i
                                       i
                               sample d from                   tool (TS-Fresh package) and the ML models employed in
                                     i
                                                           D */  this study.
                                                             if y  ̸= yˆ  then
                                    i
                                 i
                                                           D ← D \{di}  2.3.3. Time series feature extraction on the basis of
                     end                                       end
            end                                                scalable hypothesis tests
            T ← D                                              The wide availability of cheaper sensors allowed their usage
                                                               in various domains such as manufacturing, health, and
            Volume 1 Issue 3 (2024)                         39                             doi: 10.36922/ijamd.4812
   40   41   42   43   44   45   46   47   48   49   50