Page 97 - AJWEP-v22i3
P. 97

Advancing molecular property prediction using graph neural networks

                dataset for property regression tasks such as predicting   and air, serving as an indicator of its volatility  and
                molecular  energy (QM9) and a dataset  aimed  at    potential  for atmospheric  transport. A  comprehensive
                forecasting the toxic effects of chemical compounds on   dataset  of experimental  log K  values  for 2,161
                                                                                                  ao
                biological systems (TOX21). 20                      compounds was compiled, which covers a wide range
                  Multi-fidelity  learning,  which  mixes  low-fidelity   of molecular weights and log K  values. The model’s
                                                                                                 ao
                quantum chemical data with high-fidelity experimental   robustness and predictive  capability  were validated
                data, has a considerable impact on partition coefficient   through various  statistical  methods,  which include
                prediction. For example, multi-target learning obtained a   training  and prediction  set separation  and mutual
                root-mean-square error (RMSE) of 0.44 log P units for a   leave-50%-out  validation.   The  work introduces
                                                                                             26
                dataset containing molecules comparable to the training   a  new version  of the  machine-learning  algorithm,
                data, exhibiting  superior accuracy over single-task   PARTYsoc version 3, which measures the proportions
                models.  The multi-fidelity log P model, which takes   of centennially  stable  and active  soil organic  carbon
                       13
                a chemical formula as its sole input, is a useful method   (SOC) fractions using Rock-Eval (r) thermal analysis.
                for estimating K  without structural knowledge. This   This model improves on the previous version (version 2)
                               ow
                model performed similarly to traditional models with a   using a bigger dataset from 12 sites, including many
                coefficient of determination (R ) of 0.77 and an RMSE of   long-term studies, and leverages support vector
                                          2
                0.52, suggesting its usefulness in cases when structural   machine regression in conjunction with beta regression
                data  is lacking.  GNNs  with adjusted  integrated   to provide more accurate  predictions. PARTYsoc
                               21
                gradients are highly interpretable  in forecasting K .   version  3  attempts  to  improve  the  accuracy  of SOC
                                                               ow
                These  models  emphasize  the  significance  of  certain   stock development simulations in the high-performance
                atoms in the prediction process, guaranteeing precision,   engine model by identifying the best stable SOC stock
                consistency, and stability in attribution assignments. 22  for each site, resulting in improved SOC compartment
                  Machine learning models such as multiple  linear   initialization. The model performs well in both internal
                regression and random  forest regression have  been   validation and leave-one-site-out validation, confirming
                successfully calibrated  against external  test sets,   its ability to forecast stable SOC proportions. 27
                including  the  SAMPL9 challenge.  These  models,      The K  represents the ratio of a compound’s
                                                                             ow
                along with continuum solvation models, give insights   concentration  in octanol to its concentration  in water
                into  the  molecular  characteristics  that  influence   at equilibrium. It is a key descriptor of hydrophobicity
                partition  coefficients,  emphasizing  their  value  in   and is widely used in environmental  fate modeling
                computational  chemistry.   The study focuses on the   and  bioaccumulation  studies.  The  research  focuses
                                       23
                use of the created  model  in forecasting  K  values,   on predicting  the  K  of organic  compounds using
                                                         d
                                                                                       ow
                proving  its  efficacy  in  giving  site-specific  insights   extreme  learning  machine  (ELM)  models,  which
                that  can  benefit  environmental  risk  assessments  and   are useful owing to their quick learning  speed and
                pesticide management.   The study presents a new    strong generalization ability. The study uses COSMO
                                     24
                descriptor,<q-atom-centered  symmetry  functions>   descriptors  (Sσ-profile)  as  molecular  descriptors  to
                conformation,  which  includes  explicit  polarization   develop and estimate models for K . Four ELM models
                                                                                                  ow
                effects in polar phases and accounts for energetic and   were created and compared to multiple linear regression
                entropic importance in non-polar phases by averaging   models with the same descriptors. The results indicated
                entropy effects based on the Boltzmann distribution of   that the ELM models, particularly the ELM-4 model,
                conformations. This technique improves the prediction   demonstrate high reliability in predicting log K  values,
                                                                                                             ow
                of  the  partition  coefficient  (log P)  between  polar  and   which achieves a correlation coefficient R  of 0.949 and
                                                                                                         2
                non-polar phases, a critical factor in drug and material   an  RMSE  of  0.358,  indicating  their  effectiveness  for
                design. The model was trained using high-dimensional   broader applications in predicting chemical properties.
                                                                                                                    20
                neural  networks on a large public  dataset  (PhysProp)   The research examines the partitioning  behavior of
                and showed effective log P prediction across three more   anionic  perfluoroalkyl  carboxylic  acids  (PFCAs)  and
                datasets. It applies to a number of organic molecules,   perfluoroalkyl  sulfonic  acids  (PFSAs)  between  water
                including  n-carboxylic  acids and  diverse  organic   and organic phases, stressing that their anionic forms
                solvents, which makes it a useful tool for estimating   dominate due to low pK  values. It presents a developed
                                                                                         a
                partition coefficients in varied systems. 25        equation  that  ties  the  partition  coefficients  of  these
                  The K  is a key parameter that  describes the     anions to their corresponding neutral species, indicating
                         ao
                equilibrium partitioning of a compound between octanol   a linear  connection  that  can be used to estimate  the



                Volume 22 Issue 3 (2025)                        91                           doi: 10.36922/AJWEP025070041
   92   93   94   95   96   97   98   99   100   101   102