Page 97 - AJWEP-v22i3

P. 97

Advancing molecular property prediction using graph neural networks

dataset for property regression tasks such as predicting and air, serving as an indicator of its volatility and
molecular energy (QM9) and a dataset aimed at potential for atmospheric transport. A comprehensive
forecasting the toxic effects of chemical compounds on dataset of experimental log K values for 2,161
ao
biological systems (TOX21). 20 compounds was compiled, which covers a wide range
Multi-fidelity learning, which mixes low-fidelity of molecular weights and log K values. The model’s
ao
quantum chemical data with high-fidelity experimental robustness and predictive capability were validated
data, has a considerable impact on partition coefficient through various statistical methods, which include
prediction. For example, multi-target learning obtained a training and prediction set separation and mutual
root-mean-square error (RMSE) of 0.44 log P units for a leave-50%-out validation. The work introduces
26
dataset containing molecules comparable to the training a new version of the machine-learning algorithm,
data, exhibiting superior accuracy over single-task PARTYsoc version 3, which measures the proportions
models. The multi-fidelity log P model, which takes of centennially stable and active soil organic carbon
13
a chemical formula as its sole input, is a useful method (SOC) fractions using Rock-Eval (r) thermal analysis.
for estimating K without structural knowledge. This This model improves on the previous version (version 2)
ow
model performed similarly to traditional models with a using a bigger dataset from 12 sites, including many
coefficient of determination (R ) of 0.77 and an RMSE of long-term studies, and leverages support vector
2
0.52, suggesting its usefulness in cases when structural machine regression in conjunction with beta regression
data is lacking. GNNs with adjusted integrated to provide more accurate predictions. PARTYsoc
21
gradients are highly interpretable in forecasting K . version 3 attempts to improve the accuracy of SOC
ow
These models emphasize the significance of certain stock development simulations in the high-performance
atoms in the prediction process, guaranteeing precision, engine model by identifying the best stable SOC stock
consistency, and stability in attribution assignments. 22 for each site, resulting in improved SOC compartment
Machine learning models such as multiple linear initialization. The model performs well in both internal
regression and random forest regression have been validation and leave-one-site-out validation, confirming
successfully calibrated against external test sets, its ability to forecast stable SOC proportions. 27
including the SAMPL9 challenge. These models, The K represents the ratio of a compound’s
ow
along with continuum solvation models, give insights concentration in octanol to its concentration in water
into the molecular characteristics that influence at equilibrium. It is a key descriptor of hydrophobicity
partition coefficients, emphasizing their value in and is widely used in environmental fate modeling
computational chemistry. The study focuses on the and bioaccumulation studies. The research focuses
23
use of the created model in forecasting K values, on predicting the K of organic compounds using
d
ow
proving its efficacy in giving site-specific insights extreme learning machine (ELM) models, which
that can benefit environmental risk assessments and are useful owing to their quick learning speed and
pesticide management. The study presents a new strong generalization ability. The study uses COSMO
24
descriptor,<q-atom-centered symmetry functions> descriptors (Sσ-profile) as molecular descriptors to
conformation, which includes explicit polarization develop and estimate models for K . Four ELM models
ow
effects in polar phases and accounts for energetic and were created and compared to multiple linear regression
entropic importance in non-polar phases by averaging models with the same descriptors. The results indicated
entropy effects based on the Boltzmann distribution of that the ELM models, particularly the ELM-4 model,
conformations. This technique improves the prediction demonstrate high reliability in predicting log K values,
ow
of the partition coefficient (log P) between polar and which achieves a correlation coefficient R of 0.949 and
2
non-polar phases, a critical factor in drug and material an RMSE of 0.358, indicating their effectiveness for
design. The model was trained using high-dimensional broader applications in predicting chemical properties.
20
neural networks on a large public dataset (PhysProp) The research examines the partitioning behavior of
and showed effective log P prediction across three more anionic perfluoroalkyl carboxylic acids (PFCAs) and
datasets. It applies to a number of organic molecules, perfluoroalkyl sulfonic acids (PFSAs) between water
including n-carboxylic acids and diverse organic and organic phases, stressing that their anionic forms
solvents, which makes it a useful tool for estimating dominate due to low pK values. It presents a developed
a
partition coefficients in varied systems. 25 equation that ties the partition coefficients of these
The K is a key parameter that describes the anions to their corresponding neutral species, indicating
ao
equilibrium partitioning of a compound between octanol a linear connection that can be used to estimate the

Volume 22 Issue 3 (2025) 91 doi: 10.36922/AJWEP025070041

92 93 94 95 96 97 98 99 100 101 102