Page 77 - AJWEP-22-5

P. 77

Hybrid optimization for LSTM DO prediction

Use it to replace the worst-performing individual 3.4.2. Construction of the loss function and fitness
x worst in the Nadam subpopulation, injecting the global evaluation
L
search results into the local search path. The hybrid algorithm drove the evolution process by
This two-way interaction mechanism not only evaluating the predictive ability of each individual under
promotes the flow of high-quality information in the the current LSTM model. Specifically, the individual
understanding space but also helps overcome the parameter vector θ was loaded into the network, the
i
limitations of local optimality, thereby improving model performed forward propagation on the training
ˆ
overall search efficiency and solution accuracy. data, calculated the prediction result y , and used the
(e) Termination criteria and optimal solution output MSE as the fitness function: t
The hybrid algorithm terminated when either of the
following conditions was met: The maximum number 1 N  ˆ  2
θ
of iterations T was reached, or the fitness value of the  ( ) = N 1  y − y t  ∑  (XXIII)
t
i
max
optimal individual in the entire population fell below t=
the predefined threshold δ. Here, y is the true label, y is the prediction result
ˆ
The final optimization result was recorded as: t t
of the model under the parameter θ , and N is the sample
i
x argmin f x() (XXII) size.
*
xP L P G

3.4.3. Embedding process of the hybrid optimization
3.4. Application of the hybrid optimization algorithm
algorithm in LSTM training Embedding the hybrid optimization algorithm into the
To effectively improve the generalization performance
and convergence accuracy of deep learning models in parameter training of the LSTM model, the training
complex water quality data prediction tasks, this study process can be regarded as a complex black-box
introduced the previously constructed Nadam–DE function optimization problem. The specific process is
hybrid optimization algorithm into the training process as follows:
of the LSTM network. Unlike the traditional training (a) Initialization stage: Generate the initial population
mechanism based on back propagation and a single {θ , θ ,…,θ }, which is divided into the Nadam
P
2
1
gradient optimizer, the hybrid optimization strategy subpopulation and the DE subpopulation. Local
emphasizes a wide-area search in the global parameter gradient updates and global mutation searches are
space during the early stage of training and focuses on then performed.
gradient-driven fine development in the middle and later (b) Parallel optimization stage: In each generation, the
stages. This approach achieved the optimization goal Nadam subpopulation uses gradient information
of dynamic collaboration between “exploration” and to perform first-order and second-order moment
“development” in the LSTM parameter learning process. estimates for individuals and updates the parameters.
Meanwhile, the DE subpopulation completes global
3.4.1. Parameter encoding and search space definition exploration through differential variation and
Under this hybrid optimization framework, all trainable crossover mechanisms.
parameters of the LSTM model were treated as variables (c) Fitness assessment: For each individual, perform
to be optimized. These parameters were vectorized forward propagation in the LSTM model, calculate
and encoded as continuous, real-valued individuals. the loss L (θ ), and adjust the ranking of individuals
i
Assuming the model structure is fixed, the parameter based on their fitness.
set can be expressed as: (d) IEM: At intervals of T swap generations, individual
replacements based on the principle of “superior
θ = {W , W , b , W , W , b , W , W , b , W , W , b }
xi hi i xf hf f xc hc c xo ho o replacing inferior” are carried out between
Here, W represents the weight matrices corresponding subpopulations. The Nadam subpopulation
to each gating mechanism (input gate, forget gate, injects its current optimal individual into the
candidate memory unit, and output gate), and b denotes DE subpopulation, and vice versa, promoting
the bias term. After flattening the weights of ownership information sharing and collaborative search.
and the bias term into vectors, they collectively form a (e) Convergence determination: When the value of the
single individual θ ∈ R in the hybrid algorithm, where optimal loss function tends to stabilize or reaches the
d
i
d is the total dimensionality of the parameters. maximum iterative algebra G , the optimization
max

Volume 22 Issue 5 (2025) 71 doi: 10.36922/AJWEP025210165

72 73 74 75 76 77 78 79 80 81 82