Page 35 - IJAMD-2-1
P. 35
International Journal of AI for
Materials and Design
ML molecular modeling of Ru: A KAN approach
while model selection was guided by an early stopping Specifically, we employed mean absolute error (MAE) and
mechanism monitoring validation loss. mean squared error (MSE), defined as:
N
2.8. Continuous-filter convolutional neural network MAE 1 | y (Ⅹ)
y |
model N i1 i i
We implemented the SchNet model, a specialized deep- N
learning architecture designed for molecular systems. MSE 1 ( y i (Ⅺ)
49
y )
i
The model consists of an embedding layer followed N i1
by multiple interaction blocks and separate prediction
heads for energy and forces. The input features are where N is the number of data points, y represents the
i
first normalized through a batch normalization layer, true values (actual labels), and y represents the measured
i
then transformed into a 128-dimensional hidden values, including both energies and forces. This tripartite
representation through a linear embedding layer with evaluation strategy serves distinct purposes: the training
subsequent batch normalization. The model comprises loss guides the optimization process, the validation loss
three SchNet interaction blocks containing a filter enables model selection and hyperparameter tuning, and
network and two dense layers. The filter network processes the test loss provides an unbiased estimate of model
distance-based features through linear transformations generalization. This systematic computation of loss metrics
and ReLU activations, while the dense layers maintain the across training, validation, and test datasets provides a
hidden dimension of 128 channels. Each interaction block robust mechanism for detecting potential overfitting
employs layer normalization and the SiLU activation phenomena, characterized by divergence between these
function to introduce non-linearity and ensure stable performance metrics. This comprehensive monitoring
training. The model implements two parallel output strategy ensures that the model develops representations
50
pathways: one for energy prediction and another for forces. that generalize effectively to unseen data rather than
Both pathways share a similar architecture, starting with merely memorizing training examples.
a 128-dimensional representation that is progressively 3. Results and discussion
reduced through multiple layers. Each pathway includes
two batch normalization layers, ReLU activations, and 3.1. Model evaluation
45
a dropout rate of 0.1 for regularization. The energy 3.1.1. Model comparison
pathway concludes with a single output unit, while the
forces pathway produces a two-dimensional output. As presented in Table 1, KAN demonstrated exceptional
performance and practical advantages. KAN achieved
The training process employs a sophisticated impressive accuracy with MAE values of 0.36, 0.30, and
optimization strategy with hierarchical learning rates
implemented through the AdamW optimizer. The Table 1. Performance metrics comparison of KAN against
embedding layer uses a learning rate of 10 ; interaction other typical ML models (SchNet, ANI, GNN, and
-5
blocks operate at 2 × 10 ; and both output pathways CalHousNet) for MAE and MSE across training, validation,
-5
utilize 5 × 10 . A weight decay of 10 is applied across all and test sets
-5
-4
parameters. The learning rate schedule combines a warm-up
period lasting 20% of the total training steps with cosine Model Evaluation Training Validation Test sets
decay thereafter. The model is trained for 100 epochs with sets sets
a batch size of 2048 and employs gradient accumulation KAN MAE 0.36 0.30 0.31
over 24 steps to stabilize training. A progressive loss MSE 0.79 0.50 0.49
weighting scheme is implemented, where force prediction SchNet MAE 0.46 0.38 0.41
training begins after 50 epochs and gradually increases in MSE 0.88 0.55 0.59
importance. The energy weight starts at 1.0 and decreases ANI MAE 0.79 0.77 0.79
to 0.75, while the force weight increases from 0.0 to 0.5
over 50 epochs. Training stability is maintained through MSE 1.01 0.93 0.97
gradient clipping with a maximum norm of 0.01. GNN MAE 0.15 0.13 0.22
MSE 0.11 0.04 0.50
2.9. Model evaluation CalHousNet MAE 0.34 0.27 0.29
The evaluation of model performance is conducted MSE 0.76 0.34 0.51
through a comprehensive loss function framework Abbreviations: KAN: Kolmogorov-Arnold Network; ML: Machine
implemented across training, validation, and test datasets. learning; MAE: Mean absolute error; MSE: Mean squared error.
Volume 2 Issue 1 (2025) 29 doi: 10.36922/ijamd.8291

