Page 95 - AJWEP-v22i3
P. 95
Advancing molecular property prediction using graph neural networks
1. Introduction In this study, we conducted a comparative analysis
of GCN, GIN, and GAT architectures on the MUTAG
Molecular property forecasting is essential to dataset to identify the most effective model for molecular
cheminformatics and significantly contributes to property prediction. Our contributions include: (i)
various applications, including drug discovery, a systematic evaluation of GCN, GIN, and GAT on
materials science, and environmental safety evaluation. molecular data using metrics such as accuracy and area
Precise forecasting of characteristics such as solubility, under the curve (AUC), (ii) insights into the strengths
toxicity, and carcinogenicity can greatly decrease and limitations of each architecture in capturing
experimental expenses and durations, thereby hastening molecular graph features, and (iii) a comprehensive
the advancement of new compounds. Predictive analysis of the experimental results highlights GIN’s
1,2
modeling presents opportunities and challenges because exceptional performance and its potential for practical
of the unique graph structure of molecular data, where use in cheminformatics.
bonds are depicted as edges and atoms as nodes. We further used GIN to find partition coefficients
Conventional machine learning approaches frequently such as octanol–water partition coefficient (Kₒ ),
w
find it challenging to adequately represent this graph- air–water partition coefficient (Kₐ ), and soil–water
w
structured data, highlighting the need for specialized partition coefficient (K ) from the MoleculeNet dataset
d
graph learning techniques. to estimate how chemicals behave in the environment,
In modeling data that has a graph structure, graph including their solubility, volatility, and degradation
neural networks (GNNs) have emerged as highly efficient pathways. This helps in understanding their movement
tools. They facilitate notable progress in areas such as through air, water, and soil.
social networking, transportation, and bioinformatics. The MUTAG dataset provides valuable data for
In forecasting molecular properties, GNNs have shown forecasting molecular characteristics using GNN
exceptional capability by leveraging the inherent architectures. However, there are concerns regarding the
relational information present in molecular graphs. GNNs models’ scalability and resilience due to their small size
effectively capture the complex interactions among and complexity. This research broadens the assessment to
atoms and bonds that influence molecular characteristics more extensive and complex datasets, such as QM9 and
6
by iteratively enhancing node representations through ZINC , to ensure uniform performance across different
7
local and global graph structures. molecular datasets. Moreover, tests using incomplete
Among the various GNN frameworks, graph and altered data have been conducted to assess the
convolutional networks (GCNs), graph isomorphism proposed models’ robustness in real-world scenarios.
4
networks (GINs), and graph attention networks These enhancements provide a broader perspective on
5
(GATs) are notable for their effectiveness and the appropriateness of GNNs for predicting molecular
6
adaptability. GCNs broaden convolutional processes characteristics.
to graph frameworks by gathering data from a node’s Quantitative structure-property relationship models
local surroundings. GINs emphasize identifying graph have long been used to predict molecular characteristics
isomorphisms to guarantee the distinct representation of based on structural attributes. Machine learning
8
graph structures. GATs enhance feature aggregation by models can accurately estimate partition coefficients
allocating attention weights to adjacent nodes, enabling such as Kₒ , Kₐ , and K using the MoleculeNet
d
w
w
the model to focus on the most significant interactions. dataset. These models employ various methodologies,
9
The MUTAG dataset serves as a popular benchmark including GNNs 10,11 and random forest algorithms,
12
3
in predicting molecular properties, offering an excellent to make effective predictions even with minimal
platform for testing these architectures. It comprises experimental data. Furthermore, the integration of
188 molecular graphs categorized as either carcinogenic multi-fidelity learning with interpretable attribution
13
or non-carcinogenic, making it appropriate for binary models enhances both predictive power and model
14
classification tasks. Prior research has utilized different interpretability, leading to more reliable estimations.
GNNs on this dataset, obtaining top-tier outcomes. The accurate prediction of molecular properties such
For example, Kipf and Welling’s GCN showcased as partition coefficients is critical for environmental
the effectiveness of graph convolutions in molecular safety assessments, pharmaceutical design, and
classification. In the same way, the GIN developed chemical risk evaluation. These predictions help reduce
4
by Xu et al. highlighted the significance of identifying the time and cost associated with experimental testing,
graph isomorphisms for precise predictions. 5 supporting the development of safer and more efficient
Volume 22 Issue 3 (2025) 89 doi: 10.36922/AJWEP025070041