Page 44 - MSAM-2-1

P. 44

Materials Science in Additive Manufacturing Data imputation strategies of PBF Ti64

Machine learning in 3D printing is growing rapidly and realistic scores that preserve the variable distribution .
[21]
has been used to perform process and design optimization, Some widely-used imputation methods include: imputing
anomalies detection, etc. . It relies heavily on the dataset using zero, mean, median, or mode; imputing using
[18]
to train a good machine learning model to have good randomly selected value; and imputing using a model .
[22]
prediction. Given the vast number of literature investigating These techniques often impute a single and constant
the process parameters’ effects on the different properties value for each variable without capturing or reflecting the
of SLM Ti6Al4V, there is potential in collating the data relationship among the variables. This will likely result in
and using machine learning to perform data analytics on an incorrect process-properties relationship.
the dataset to determine the process-structure-properties Model-based imputation methods can be categorized
relationship. There are missing values present in the into two types: those that make predictions for the missing
collated SLM Ti6Al4V dataset as each property/parameter values based on similar data points, and those that attempt
has been studied in isolation, but the quantity of data is to construct a global model to infer the missing data. The
insufficient for machine learning; therefore, imputation is former includes algorithms such as k-nearest neighbors
required to bolster the data volume. Hence, the data from (kNN), while the latter encompasses deep learning neural
the literature are considered incomplete, and imputation of networks.
the missing data is required as a pre-processing step before
subsequent analysis can be carried out. The present study is focused on the investigation of the
effect of different model-based imputation techniques on
Researchers have utilized various kinds of techniques the process-structure relationship of the SLM Ti6Al4V
to impute missing data in manufacturing processes. dataset. The results of the imputation were evaluated to
For instance, Steiner et al. aimed to develop real-time determine the best strategy for the dataset. This article
predictive models of two key strength properties of a will first present the methodology, followed by results and
wood composite manufacturing process using real-time discussion about the different imputation methods, and
process and destructive test data collected from a wood finally the investigation of the imputed dataset.
composite manufacturer . However, sensor malfunction
[19]
and data “send/retrieval” problems lead to null fields 2. Methodology
in the company’s data warehouse, which resulted in
information loss. To overcome this challenge, two missing 2.1. Imputation methods
data imputation methods, expectation-maximization 2.1.1. k-Nearest neighbors (kNN) imputation
(EM) algorithm and multiple imputation (MI) using kNN imputation is one of the most common methods to
Markov Chain Monte Carlo (MCMC) simulation, were impute missing values. It is used for both classification and
used to impute the missing data. Predictive models regression problems . The algorithm identifies k number
[23]
based on the imputed datasets generated more precise of neighboring points using a distance metric and estimates
prediction results than models of non-imputed datasets. the missing values using the values of these k neighboring
In addition, Bayesian Additive Regression Tree (BART) observations .
[24]
produced the most precise prediction results among four
predictive modeling methods. In another work, Wang The distance metric is generally Euclidean, and the
et al. discuss the importance of data mining in intelligent function can be defined as
m
manufacturing and introduce an energy monitoring Ex y,  x ( y ) 2 (I)
platform for small- and medium-sized enterprises that i1 i i
records energy consumption data at various levels of
[20]
granularity . However, incomplete data can lead to an Where x and y are the point of interest and a case
i
i
inaccurate portrayal of the system, so Wang et al. propose point from the dataset, and m is the number of input
[25]
a novel orthogonal-least-square-based autoencoder to variables . The process flow for the imputation is shown
generate new samples for the imputation of missing in Figure 2.
values. The proposed approach outperforms alternative Since the kNN algorithm is non-parametric , there is
[23]
methods significantly for missing ratios >0.05 based on no underlying assumption on the distribution of data, and
experimental results using real industrial datasets. hence, kNN is suitable for datasets with varied distributions.
There are many data imputation strategies, from Imputation was done using Scikit-learn’s KNN
simple statistical methods such as mean imputation and Imputer class . For calculation of the distance involving
[26]
regression imputation to more complex methods such as missing values, the coordinates of the missing value are
hot-deck imputation, which imputes the missing data by ignored and the weights of the remaining coordinates

Volume 2 Issue 1 (2023) 3 https://doi.org/10.36922/msam.50

39 40 41 42 43 44 45 46 47 48 49