Page 32 - IJAMD-2-1
P. 32

International Journal of AI for
            Materials and Design
                                                                            ML molecular modeling of Ru: A KAN approach



                         A                                   B













            Figure 2. Principal component analysis (PCA) reveals dominant structural features and efficient dimensionality reduction in the structure and force
            data. (A) Cumulative explained variance ratio vs. structure component number. PC0 accounts for 80.5% of the total variance, with PC1 adding 12.2%,
            PC2 adding 4.9%, and the first three PCs collectively explaining 97.62%, exceeding the 95% threshold. (B) Cumulative explained variance ratio vs. force
            component number. PC0 accounts for 66.6% of the total variance, with PC1 adding 28.8%, exceeding the 95% threshold.
            Abbreviation: PC: Principal component.

            elucidates the cumulative explained variance ratio as a   effectively standardizes the data without distorting its
            function of component number. The first PC accounts for   inherent features.
            66.6% of the total variance, demonstrating its dominant   Subsequently, we optimized the training process by
            role in capturing the dataset’s primary features. The   leveraging graphical processing units (GPUs), which allowed
            second PC contributes an additional 28.8%, bringing the   for  significantly  faster computation and  more  efficient
            cumulative explained variance to surpass the often-used   handling of complex tensor operations required by our ML
            95% threshold for dimensionality reduction.  Figure S2   algorithms. We configured our computational framework
            presents a heatmap of force PCA loadings, indicating that   to dynamically allocate tasks across available GPUs,
            each original force feature is well-represented in the PC   maximizing resource utilization and reducing processing
            space,  suggesting  that  the  PCA  effectively  captures  the   times. The dataset is split into 60% for training, 20% for
            key variance in the dataset without substantial loss of   validation, and 20% for testing to ensure a comprehensive
            information from any feature.                      evaluation of model performance. As displayed in Figure 4,
              After performing PCA on the structural data, we   the distribution of samples across the PCs exhibits consistent
            normalized the target outputs to address the inherent scale   cross-shaped patterns across all three subsets. This similarity
            differences between energy and force values, such that:  in data distribution confirms that our splitting strategy has
                                                               effectively maintained the representativeness of the original
                X
            Z                                        (Ⅶ)      dataset in each subset. The validation set allows us to monitor
                                                              model performance during training and prevent overfitting,
                                                               while the test set provides an unbiased evaluation of the final
              where Z is the normalized value, X is the original value,
            μ is the mean of the data, and σ is the standard deviation of   model performance.
            the data. This normalization is crucial as energy and force   2.4. KAN model
            are measured in different units and exhibit distinct ranges
            of magnitudes, which could potentially bias the model   The KAN model was configured with a network width of
            training. By normalizing these quantities, we ensure that   [3, 4, 3] and utilized a grid size of 8 across three dimensions
                                                                     27
            both energy and force components contribute equally to the   (k = 3).  To enhance the training process, we employed
            loss function during model training, preventing the larger-  the Adam optimizer, a stochastic gradient descent method
            scale variable from dominating the optimization process.   known for its ability to compute adaptive learning rates for
                                                                           42
            This standardized representation of target variables helps   each parameter.  This optimization method is particularly
            maintain balanced gradients during training and enables   effective in  managing  sparse  gradients  and  dynamically
            the model to learn the relationships between structural   adjusting the learning rate, which significantly improves
            features and both energy and force predictions with   the speed of convergence and the overall performance of
            equal emphasis. The normalization process preserves the   the model.
            underlying data patterns while adjusting the scale, and the   By  adopting  the  Adam  optimizer,  we  ensured  that
            numerical ranges are standardized to comparable scales   our model efficiently navigated through the complex loss
            (Figure  3). This consistency in patterns before and after   landscape, which led to faster convergence and improved
            normalization confirms that our preprocessing approach   predictive accuracy.  In addition,  we  implemented L2


            Volume 2 Issue 1 (2025)                         26                             doi: 10.36922/ijamd.8291
   27   28   29   30   31   32   33   34   35   36   37