Page 15 - IJAMD-1-2
P. 15
International Journal of AI for
Materials and Design
Sustainable electronics using AI/ML
each token represents an atom or a bond, and learns to limit accuracy when dealing with complex molecular
predict biodegradability based on these sequential patterns. interactions.
This approach has shown promising results in accurately (2) Multiple linear regression. It extends LR by considering
predicting the biodegradability of chemical compounds, multiple chemical descriptors. It is useful for capturing
providing a powerful tool for environmental chemistry and the combined effect of various molecular features on
drug design. biodegradability.
(3) Polynomial regression. It is suitable for modeling non-
4.1.7. BO linear relationships between chemical properties and
The BO algorithm is widely used due to its efficiency biodegradation rates. It allows for capturing more
in finding good solutions with few iterations. BO uses complex patterns other than those captured by LR.
a surrogate function, typically a Gaussian process, to (4) SVR. Effective for handling non-linear relationships
approximate the objective function, and an acquisition and high-dimensional data, SVR can provide more
function, such as expected improvement (EI), to explore accurate predictions for complex biodegradation
the solution space. BO achieves this optimization through processes.
a combination of surrogate models and acquisition (5) Random forest regression. It is an ensemble method
functions as follows: (1) Sampling the objective function that uses multiple decision trees to improve prediction
at random points to build an initial dataset and surrogate accuracy and robustness. It is particularly useful for
model; (2) using the acquisition function to find the handling large datasets with many features.
next point that minimizes the surrogate model, and (6) Neural networks. These are deep learning models,
updating the dataset and surrogate model accordingly; including feedforward neural networks, that can
and (3) repeating this process to iteratively improve the capture intricate non-linear relationships between
approximation of the objective function until the global chemical structures and biodegradation. They require
minimum is found. In biodegradability prediction, large amounts of data and computational resources
BO helps in refining model parameters to enhance but can provide highly accurate predictions.
accuracy, enabling more precise assessments of how Taken together, regression techniques in ML represent
chemical compounds break down in the environment, a powerful tool for predicting the biodegradability of
which is crucial for designing environmentally friendly chemical compounds, aiding in environmental protection
substances. 83 and the development of sustainable products. The
Besides classification techniques, regression techniques ongoing refinement of these models and integration with
in ML are also increasingly being utilized in the field of experimental data will further enhance their applicability
biodegradability to predict how chemical compounds and reliability. Here, we acknowledge the importance of
break down in the environment. This involves modeling regression methods for predicting continuous outcomes,
the relationship between chemical structures and such as degradation time. However, according to a recent
their biodegradation rates or degradation half-lives. investigation, there is a notable scarcity of numerical
86
87
Understanding and predicting biodegradability is crucial data (<3200 records) necessary for regression analysis.
for assessing the environmental impact of chemicals, Specifically, the lack of available and standardized
pharmaceuticals, and other substances. Typically, characterization of parameters, such as reaction setup,
regression involves predicting a continuous numerical binary classification, and degradation time, represents a
value for given input data using different regression models nearly insurmountable obstacle to the ML-aided design of
and evaluation metrics. To predict the rate at which a transient materials, i.e., experimental datasets containing
chemical compound degrades in the environment, different properties such as degradation rate are limited. This scarcity
regression models are used including linear regression, hinders the ability to create robust regression models. For
polynomial regression, support vector regression (SVR), instance, degradation time is often not included in current
and more advanced techniques like random forests and databases, making it difficult to assemble a labeled dataset
neural networks. It uses further evaluation metrics such as for training predictive models. Huang and Zhang. have
88
mean absolute error, mean squared error (MSE), R-squared attempted to address these issues by compiling a large dataset
(RS), and root MSE techniques to predict the continuous of 12,750 records, which encompass various biodegradation
numerical value. All these regression models are briefly conditions, and developing robust regression and
explained as follows: 84,85 classification models. However, the regression model only
(1) Linear regression (LR). This is a simple and effective achieved an R² of 0.54, while the best classification model
model that can provide a baseline for predicting reached an accuracy of 85.1%, which improved to 87.6%
biodegradation rates. However, their simplicity might with chemical speciation considerations. Furthermore,
88
Volume 1 Issue 2 (2024) 9 doi: 10.36922/ijamd.3173

