Page 105 - AJWEP-v22i3
P. 105

Advancing molecular property prediction using graph neural networks

                aggregation  method,  based on summation,  facilitates   graph features, achieving high recall and precision in
                strong feature representation that corresponds with the   graph classification tasks. 18
                results from Xu et al. (2019),  where GIN was noted for   The  findings  highlight  the  adaptability  of  GNNs
                                         5
                its ability to differentiate graph architectures effectively.   for molecular  property  prediction  tasks, with  each
                GAT attained strong results with elevated recall (0.89)   architecture excelling in specific aspects such as GIN
                and ROC-AUC (0.83). Its capacity to allocate attention   for structural differentiation, GAT for noisy or complex
                weights to significant neighbors enhances its resilience   graphs,  and  GCN  for  computational  efficiency.  This
                in  noisy  graphs.  This  aligns  with  Veličković  et al.,    study underscores the importance  of tailoring  GNN
                                                               18
                who illustrated the usefulness of attention in complex   architectures  to dataset  characteristics.  For small
                graphs,  exhibiting  dependable  yet  slightly  lower   datasets such as MUTAG, simpler architectures  such
                performance  with  an  accuracy  of  87.50%.  Its  failure   as GIN perform exceptionally well, whereas attention-
                to adequately  distinguish  subtle  graph structures  in   based models such as GAT may excel in larger, noisier
                MUTAG corresponds with previous research by Kipf    datasets. This comparative analysis places our findings
                and  Welling  that  pinpointed  GCN shortcomings  in   in the broader context of molecular property prediction
                intricate graph situations. 4                       research, demonstrating the strengths and trade-offs of
                  The  MUTAG dataset  consists  of  small  molecular   various GNN architectures.
                graphs in  which  features  of nodes and  edges,  such   The GIN-based model’s prediction of K , K , and
                as atom  types and bond types, are  essential.  GNN                                        ₒw  ₐw
                models successfully capture  these characteristics  and   K  is compared  to benchmark  datasets. To determine
                                                                      d
                demonstrate their ability to predict molecular properties.   the usefulness of the suggested technique, we compare
                GIN demonstrated quicker convergence throughout the   our results  to established  machine  learning  models
                training process, indicating its efficiency in computing   and experimental  datasets typically  used for partition
                graph embeddings.                                   coefficient  calculation.  The  comparison  is  shown  in
                  Early works on the MUTAG dataset used kernel-     Table 3.
                based approaches, such as the Graph Kernel method      All three GNN architectures  demonstrated  strong
                which achieved ~85% accuracy.  Kipf and  Welling    performance on the MoleculeNet dataset for predicting
                                              18
                introduced GCN, achieving approximately 86%         key environmental partition coefficients (Kₒ , Kₐ , and
                                                                                                           w
                                                                                                                w
                accuracy on MUTAG.  Our GCN implementation          K ). While the performance differences were within a
                                     4
                                                                      d
                slightly improved results due to optimized          narrow range (approximately 2 – 3%), the GIN model
                hyperparameters and regularization techniques. Xu   consistently yielded slightly better results across most
                                                                                       2
                et al. reported that GIN achieved better performance   evaluation metrics (R , MAE, and RMSE). This suggests
                than the performance benchmark set by prior leading   a modest advantage in its ability to capture molecular
                models in the literature on MUTAG.  Our results     graph  topology  more  effectively  through  its  injective
                                                    5
                (89.20%)  are  consistent,  which  affirms  GIN’s   aggregation functions. However, we acknowledge that
                robustness  in  molecular  tasks.  Veličković  et al.   these differences are relatively small and may fall within
                demonstrated the ability of GAT to focus on important   the bounds of experimental variability.
                 Table 3. Comparison of partition coefficient with existing methods
                 Property         Traditional models                      Traditional model       Graph isomorphism
                                                                                                    network model
                                                                       R 2       MAE      RMSE    R 2  MAE    RMSE
                                  Random forests, extreme gradient   0.80 – 0.87  0.25 – 0.35  0.40  0.88  0.22  0.35
                 Kₒ w
                                  boosting, support vector regression
                 Kₐ wClick or tap   Quantitative structure-activity   0.75 – 0.82  0.30 – 0.40  0.45  0.85  0.26  0.40
                 here to enter text.  relationship, machine learning models
                 K Click or tap here  Random forests, neural networks  0.85 – 0.90  0.24 – 0.32  0.38  0.91  0.20  0.30
                  d
                 to enter text.
                 Abbreviations: K aw: Air–water partition coefficient; K d: Soil–water partition coefficient; K ow: Octanol–water partition coefficient;
                 MAE: Mean absolute error; R : Coefficient of determination; RMSE: Root-mean-square error.
                                      2




                Volume 22 Issue 3 (2025)                        99                           doi: 10.36922/AJWEP025070041
   100   101   102   103   104   105   106   107   108   109   110