Page 95 - AJWEP-v22i3
P. 95

Advancing molecular property prediction using graph neural networks

                1. Introduction                                        In this study, we conducted a comparative analysis
                                                                    of GCN, GIN, and GAT architectures on the MUTAG
                Molecular  property  forecasting  is essential  to   dataset to identify the most effective model for molecular
                cheminformatics  and  significantly  contributes  to   property prediction.  Our contributions include:  (i)
                various  applications,  including  drug  discovery,   a  systematic  evaluation  of GCN, GIN, and  GAT on
                materials science, and environmental safety evaluation.   molecular data using metrics such as accuracy and area
                Precise forecasting of characteristics such as solubility,   under the curve (AUC), (ii) insights into the strengths
                toxicity, and  carcinogenicity  can  greatly  decrease   and limitations of each  architecture  in capturing
                experimental expenses and durations, thereby hastening   molecular  graph features,  and (iii)  a comprehensive
                the advancement of new compounds.  Predictive       analysis  of the  experimental  results  highlights  GIN’s
                                                    1,2
                modeling presents opportunities and challenges because   exceptional performance and its potential for practical
                of the unique graph structure of molecular data, where   use in cheminformatics.
                bonds are depicted  as edges and atoms as nodes.       We  further  used  GIN  to  find  partition  coefficients
                Conventional machine learning approaches frequently   such  as  octanol–water  partition  coefficient  (Kₒ ),
                                                                                                                   w
                find it challenging to adequately represent this graph-  air–water  partition  coefficient  (Kₐ ),  and  soil–water
                                                                                                    w
                structured  data,  highlighting  the  need  for specialized   partition coefficient (K ) from the MoleculeNet dataset
                                                                                        d
                graph learning techniques.                          to estimate how chemicals behave in the environment,
                  In  modeling  data  that  has  a  graph  structure,  graph   including  their  solubility, volatility,  and  degradation
                neural networks (GNNs) have emerged as highly efficient   pathways. This helps in understanding their movement
                tools. They facilitate notable progress in areas such as   through air, water, and soil.
                social networking, transportation, and bioinformatics.   The MUTAG dataset  provides valuable  data for
                In forecasting molecular properties, GNNs have shown   forecasting  molecular  characteristics  using  GNN
                exceptional capability by leveraging the inherent   architectures. However, there are concerns regarding the
                relational information present in molecular graphs. GNNs   models’ scalability and resilience due to their small size
                effectively  capture  the  complex  interactions  among   and complexity. This research broadens the assessment to
                atoms and bonds that influence molecular characteristics   more extensive and complex datasets, such as QM9  and
                                                                                                                 6
                by iteratively enhancing node representations through   ZINC , to ensure uniform performance across different
                                                                         7
                local and global graph structures.                  molecular  datasets. Moreover, tests using incomplete
                  Among the  various GNN frameworks, graph          and  altered  data  have  been  conducted  to  assess the
                convolutional  networks (GCNs),  graph  isomorphism   proposed models’ robustness  in real-world scenarios.
                                             4
                networks (GINs),  and graph attention  networks     These enhancements provide a broader perspective on
                                 5
                (GATs)   are  notable  for  their  effectiveness  and   the appropriateness of GNNs for predicting molecular
                      6
                adaptability. GCNs  broaden convolutional  processes   characteristics.
                to graph frameworks by gathering data from a node’s    Quantitative  structure-property  relationship  models
                local surroundings. GINs emphasize identifying graph   have long been used to predict molecular characteristics
                isomorphisms to guarantee the distinct representation of   based on structural  attributes.  Machine learning
                                                                                                  8
                graph structures. GATs enhance feature aggregation by   models  can  accurately  estimate  partition  coefficients
                allocating attention weights to adjacent nodes, enabling   such  as  Kₒ ,  Kₐ , and K  using the  MoleculeNet
                                                                                             d
                                                                                    w
                                                                               w
                the model to focus on the most significant interactions.  dataset.  These models employ various methodologies,
                                                                           9
                  The MUTAG dataset  serves as a popular benchmark   including GNNs 10,11  and random forest algorithms,
                                                                                                                    12
                                     3
                in predicting molecular properties, offering an excellent   to  make  effective  predictions  even  with  minimal
                platform  for  testing  these  architectures.  It  comprises   experimental  data. Furthermore, the integration  of
                188 molecular graphs categorized as either carcinogenic   multi-fidelity  learning  with interpretable  attribution
                                                                                        13
                or non-carcinogenic, making it appropriate for binary   models  enhances both predictive  power and model
                                                                           14
                classification tasks. Prior research has utilized different   interpretability, leading to more reliable estimations.
                GNNs on this dataset,  obtaining  top-tier  outcomes.   The accurate prediction of molecular properties such
                For example,  Kipf and  Welling’s GCN showcased     as  partition  coefficients  is  critical  for  environmental
                the  effectiveness  of  graph  convolutions  in  molecular   safety assessments, pharmaceutical  design, and
                classification.  In the same way, the GIN developed   chemical risk evaluation. These predictions help reduce
                            4
                by Xu et al. highlighted the significance of identifying   the time and cost associated with experimental testing,
                graph isomorphisms for precise predictions. 5       supporting the development of safer and more efficient


                Volume 22 Issue 3 (2025)                        89                           doi: 10.36922/AJWEP025070041
   90   91   92   93   94   95   96   97   98   99   100