Page 87 - AJWEP-22-6

P. 87

ML-based C for side trapezoidal labyrinth weirs
d
P In the SVM setup, the values of tuning parameters C and γ
DDR= -1 (XV)
O were determined as 67 and 5.5, respectively. The values of
RMSE, MAE, R , and C d(DDRmax) during both the training
2
Where O and P are observed and predicted values, and testing phases were as follows: 0.02679, 0.02318,
respectively; O and P are the mean of the observed and 0.79808, and 7.50, respectively; and 0.0113, 0.0066,
the predicted values, respectively; and N is the total 0.7573, and 10.36, respectively.
dataset number. The first three criteria assess the mean Table 5 summarizes the setting parameters of the
2
error values associated with the implemented models. superior GEP model. The values for RMSE, MAE, R ,
To address this deficiency, Noori et al. introduced the and C d(DDRmax) were observed as follows: 0.03847,
51
DDR index. To enhance interpretation and visualization, 0.03275, 0.74962, and 5.35, respectively, during the
the Gaussian distribution of DDR values should be training phase and 0.0181, 0.0109, 0.5624, and 6.32,
depicted as a standard normal distribution. To achieve respectively, during the testing phase. The operators
1
this, two steps are followed. First, the DDR values (C ) involved in the GEP are +, −, ×,/, √, e , ln, x , x , x .
2
x
3
d
3
are standardized, resulting in the calculation of the Figure 3 demonstrates the ET of the GEP output. The
normalized DDR value (C d[DDR] ) using a Gaussian corresponding values in Figure 3 are G1C1 = −0.476807,
function. Second, a plot is created, where C d(DDR) values G2C0 = 2.40567, G2C1 = −0.476807, G3C0 = −0.395355,

are plotted against their standardized counterparts G3C1 = −0.665436. It is worth noting that d0, d1, and
(DDR)graph, a greater alignment
(Z DDR ). In the Z DDR vs. C d B H w
of error distribution toward the centerline and larger d2 stand for , d , and 1 , respectively.
C values indicate increased accuracy. w 1 P w 2
d(DDR)
The MLP model in this study has the following
2.5. Overall methodology performance metric values (RMSE, MAE, R , and
2
Machine learning prediction models were developed C d[DDRmax] ) associated with it: 0.02426, 0.02031, 0.81602,
using the discussed SVM, GEP, ANN, and MARS and 8.07, respectively, during the training phase, and
methods. Of the aggregate data gathered from the 0.0111, 0.0065, 0.6878, and 11.32, respectively, during
experiments, the proportions allocated for the phases the testing phase. The last model employed in this
of the model’s training and testing were 70% and investigation is the MARS model, with performance
30%, respectively. The developed models were then metrics reaching 0.04995, 0.04381, 0.51011, and 4.02,
tested for their performance as per the metrics given in respectively, during the training phase and 0.0245,
Equations XII-XV. 0.0149, 0.5593, and 4.64, respectively, during the testing
phase. In the process of the MARS model development,
3. Results and discussion an initial set of 21 basis functions was taken into
account during the first step. Subsequently, during the
Table 4 presents an overview of the statistical performance second step (pruning step), 18 of these basis functions
criteria for the predicted models. For the establishment were removed. Ultimately, the optimal MARS model,
of the SVM, an evaluation was conducted on both the consisting of three basis functions, was obtained. The
RBF and the polynomial kernel function. Subsequent representation of the acquired MARS model is presented
testing of these kernel functions revealed that the RBF in Equation XVI, while the elaborated representation is
kernel function outperformed the polynomial function. detailed in Table 6.

Table 4. Summary of the model performance
Model Training phase Testing phases
RMSE MAE R 2 C d (DDRmax) RMSE MAE R 2 C d (DDRmax)
SVM 0.02679 0.02318 0.79808 7.50 0.0113 0.0066 0.7573 10.36
GEP 0.03847 0.03275 0.74962 5.35 0.0181 0.0109 0.5624 6.32
MLP 0.02426 0.02031 0.81602 8.07 0.0111 0.0065 0.6878 11.32
MARS 0.04995 0.04381 0.51011 4.02 0.0245 0.0149 0.5593 4.64
Notes: R is the determination coefficient; C d (DDRmax) is the maximum normalized developed discrepancy ratio value.
2
Abbreviations: GEP: Gene expression programming; MAE: Mean absolute error; MARS: Multivariate adaptive regression splines;
MLP: Multilayer perceptron; RMSE: Root mean square error; SVM: Support vector machine.

Volume 22 Issue 6 (2025) 81 doi: 10.36922/AJWEP025120081

82 83 84 85 86 87 88 89 90 91 92