Page 15 - IJAMD-1-2
P. 15

International Journal of AI for
            Materials and Design
                                                                                    Sustainable electronics using AI/ML


            each token represents an atom or a bond, and learns to   limit accuracy when dealing with complex molecular
            predict biodegradability based on these sequential patterns.   interactions.
            This approach has shown promising results in accurately   (2)  Multiple linear regression. It extends LR by considering
            predicting the biodegradability of chemical compounds,   multiple chemical descriptors. It is useful for capturing
            providing a powerful tool for environmental chemistry and   the combined effect of various molecular features on
            drug design.                                          biodegradability.
                                                               (3)  Polynomial regression. It is suitable for modeling non-
            4.1.7. BO                                             linear relationships between chemical properties and

            The BO algorithm is widely used due  to its efficiency   biodegradation rates. It allows for capturing more
            in finding good solutions with few iterations. BO uses   complex patterns other than those captured by LR.
            a surrogate function, typically a Gaussian process, to   (4)  SVR. Effective for handling non-linear relationships
            approximate the  objective function, and an  acquisition   and high-dimensional data, SVR can provide more
            function, such as expected improvement (EI), to explore   accurate predictions for complex biodegradation
            the solution space. BO achieves this optimization through   processes.
            a combination of surrogate models and acquisition   (5)  Random forest regression. It is an ensemble method
            functions as follows: (1) Sampling the objective function   that uses multiple decision trees to improve prediction
            at random points to build an initial dataset and surrogate   accuracy and robustness. It is particularly useful for
            model; (2) using the acquisition function to find the   handling large datasets with many features.
            next point that minimizes the surrogate model, and   (6)  Neural networks. These are deep learning models,
            updating the dataset and surrogate model accordingly;   including feedforward neural networks, that can
            and (3) repeating this process to iteratively improve the   capture intricate non-linear relationships between
            approximation of the objective function until the global   chemical structures and biodegradation. They require
            minimum is found.  In biodegradability prediction,    large amounts of data and computational resources
            BO helps in refining model parameters to enhance      but can provide highly accurate predictions.
            accuracy, enabling more precise assessments of how   Taken together, regression techniques in ML represent
            chemical compounds break down in the environment,   a powerful tool for predicting the biodegradability of
            which is crucial for designing environmentally friendly   chemical compounds, aiding in environmental protection
            substances. 83                                     and the development of sustainable products. The
              Besides classification techniques, regression techniques   ongoing refinement of these models and integration with
            in ML are also increasingly being utilized in the field of   experimental data will further enhance their applicability
            biodegradability to predict how chemical compounds   and  reliability.  Here,  we  acknowledge  the  importance  of
            break down in the environment. This involves modeling   regression  methods  for  predicting  continuous  outcomes,
            the relationship between chemical structures and   such as degradation time. However, according to a recent
            their  biodegradation  rates  or  degradation  half-lives.   investigation,  there is a notable scarcity of numerical
                                                                         86
                                                                                                            87
            Understanding and predicting biodegradability is crucial   data (<3200 records) necessary for regression analysis.
            for assessing the environmental impact of chemicals,   Specifically, the lack of available and standardized
            pharmaceuticals, and other substances. Typically,   characterization  of parameters,  such as reaction setup,
            regression involves predicting a continuous numerical   binary classification, and degradation time, represents a
            value for given input data using different regression models   nearly insurmountable obstacle to the ML-aided design of
            and evaluation metrics. To predict the rate at which a   transient materials,  i.e., experimental datasets containing
            chemical compound degrades in the environment, different   properties such as degradation rate are limited. This scarcity
            regression models  are  used including linear regression,   hinders the ability to create robust regression models. For
            polynomial regression, support vector regression (SVR),   instance, degradation time is often not included in current
            and more advanced techniques like random forests and   databases, making it difficult to assemble a labeled dataset
            neural networks. It uses further evaluation metrics such as   for training predictive models. Huang and Zhang.  have
                                                                                                        88
            mean absolute error, mean squared error (MSE), R-squared   attempted to address these issues by compiling a large dataset
            (RS), and root MSE techniques to predict the continuous   of 12,750 records, which encompass various biodegradation
            numerical value. All these regression models are briefly   conditions, and developing robust regression and
            explained as follows: 84,85                        classification models. However, the regression model only
            (1)  Linear regression (LR). This is a simple and effective   achieved an R² of 0.54, while the best classification model
               model  that  can  provide  a  baseline  for  predicting   reached an accuracy of 85.1%, which improved to 87.6%
               biodegradation rates. However, their simplicity might   with chemical speciation considerations.  Furthermore,
                                                                                                 88

            Volume 1 Issue 2 (2024)                         9                              doi: 10.36922/ijamd.3173
   10   11   12   13   14   15   16   17   18   19   20