Page 35 - IJAMD-2-1
P. 35

International Journal of AI for
            Materials and Design
                                                                            ML molecular modeling of Ru: A KAN approach


            while model selection was guided by an early stopping   Specifically, we employed mean absolute error (MAE) and
            mechanism monitoring validation loss.              mean squared error (MSE), defined as:
                                                                        N
            2.8. Continuous-filter convolutional neural network   MAE   1  | y                        (Ⅹ)
                                                                              y |
            model                                                     N  i1  i  i
            We implemented the SchNet model, a specialized deep-        N
            learning architecture designed for molecular systems.    MSE   1  ( y   i                 (Ⅺ)
                                                         49
                                                                              y )
                                                                           i
            The model consists of an embedding layer followed        N  i1
            by multiple interaction blocks and separate prediction
            heads for energy and forces. The input features are   where N is the number of data points, y  represents the
                                                                                                 i
                                                                                        
            first normalized through a batch normalization layer,   true values (actual labels), and  y  represents the measured
                                                                                         i
            then transformed into a 128-dimensional hidden     values, including both energies and forces. This tripartite
            representation through  a linear embedding layer with   evaluation strategy serves distinct purposes: the training
            subsequent batch normalization. The model comprises   loss guides the optimization process, the validation loss
            three SchNet interaction blocks containing a filter   enables model selection and hyperparameter tuning, and
            network and two dense layers. The filter network processes   the test loss provides an unbiased estimate of model
            distance-based features through linear transformations   generalization. This systematic computation of loss metrics
            and ReLU activations, while the dense layers maintain the   across training, validation, and test datasets provides a
            hidden dimension of 128 channels. Each interaction block   robust mechanism for detecting potential overfitting
            employs layer normalization and the SiLU activation   phenomena,  characterized  by  divergence  between  these
            function to introduce non-linearity and ensure stable   performance metrics. This comprehensive monitoring
            training.  The model implements two parallel output   strategy ensures that the model develops representations
                   50
            pathways: one for energy prediction and another for forces.   that generalize effectively to unseen data rather than
            Both pathways share a similar architecture, starting with   merely memorizing training examples.
            a 128-dimensional representation that is progressively   3. Results and discussion
            reduced through multiple layers. Each pathway includes
            two batch normalization layers, ReLU activations, and   3.1. Model evaluation
                                               45
            a dropout rate of 0.1 for regularization.  The energy   3.1.1. Model comparison
            pathway concludes with a single output unit, while the
            forces pathway produces a two-dimensional output.  As presented in Table 1, KAN demonstrated exceptional
                                                               performance and practical advantages. KAN achieved
              The training process employs a sophisticated     impressive accuracy with MAE values of 0.36, 0.30, and
            optimization strategy with hierarchical learning rates
            implemented through the AdamW optimizer. The       Table 1. Performance metrics comparison of KAN against
            embedding layer uses a learning rate of 10 ; interaction   other typical ML models (SchNet, ANI, GNN, and
                                               -5
            blocks operate at 2 × 10 ; and both output pathways   CalHousNet) for MAE and MSE across training, validation,
                                 -5
            utilize 5 × 10 . A weight decay of 10  is applied across all   and test sets
                      -5
                                         -4
            parameters. The learning rate schedule combines a warm-up
            period lasting 20% of the total training steps with cosine   Model  Evaluation  Training   Validation   Test sets
            decay thereafter. The model is trained for 100 epochs with               sets      sets
            a batch size of 2048 and employs gradient accumulation   KAN  MAE        0.36      0.30      0.31
            over 24 steps to stabilize training. A  progressive loss     MSE         0.79      0.50      0.49
            weighting scheme is implemented, where force prediction   SchNet  MAE    0.46      0.38      0.41
            training begins after 50 epochs and gradually increases in   MSE         0.88      0.55      0.59
            importance. The energy weight starts at 1.0 and decreases   ANI  MAE     0.79      0.77      0.79
            to 0.75, while the force weight increases from 0.0 to 0.5
            over 50 epochs. Training stability is maintained through     MSE         1.01      0.93      0.97
            gradient clipping with a maximum norm of 0.01.     GNN       MAE         0.15      0.13      0.22
                                                                         MSE         0.11      0.04      0.50
            2.9. Model evaluation                              CalHousNet  MAE       0.34      0.27      0.29
            The evaluation of model performance is conducted             MSE         0.76      0.34      0.51
            through a comprehensive loss function framework    Abbreviations: KAN: Kolmogorov-Arnold Network; ML: Machine
            implemented across training, validation, and test datasets.   learning; MAE: Mean absolute error; MSE: Mean squared error.


            Volume 2 Issue 1 (2025)                         29                             doi: 10.36922/ijamd.8291
   30   31   32   33   34   35   36   37   38   39   40