Page 73 - AJWEP-22-5
P. 73
Hybrid optimization for LSTM DO prediction
mean absolute percentage error compared to standalone and population-based optimization strategies. Nadam
LSTM models. Hao developed a DO prediction method is an improved method that introduces the Nesterov
18
combining the maximum information coefficient with momentum mechanism into the Adam optimizer, aiming
LSTM neural networks to identify the most influential to enhance the predictive ability and convergence
environmental variables. This model outperformed efficiency of parameter updates. The core idea is to
traditional statistical methods and other machine learning simultaneously utilize first-order and second-order
techniques in predictive accuracy. However, LSTM moment estimations to smooth the gradient information
models often suffer from slow convergence during and predict the future gradient direction through
training. To accelerate this process, many researchers Nesterov momentum, resulting in better convergence
employ the Nadam algorithm. For instance, Gandh trends and robustness during parameter updates. 9
et al. designed a hybrid LSTM–gated recurrent unit In its parameter update mechanism, the gradient
19
architecture for intelligent aquaculture water quality is first subjected to first-order moment estimation
forecasting. Comparative evaluations of optimizers (i.e., momentum term) and second-order moment
(e.g., Adaptive Moment Estimation [Adam], Nadam) estimation (i.e., variance term), calculated as follows:
revealed enhanced prediction accuracy and computational
efficiency across two distinct water quality datasets. Lee m = β m + (1-β ) g t (I)
1
t
t-1
1
et al. utilized Adam-optimized LSTM to impute missing Where g represents the current gradient and β is the
20
t
1
DO values in land-based aquaculture systems, achieving momentum decay factor typically set to 0.9. The second
an error margin of ~3.25%. Nevertheless, gradient-based moment is calculated as:
methods like stochastic gradient descent may converge to
suboptimal local minima in the non-convex optimization v 2 v t1 1 2 g t 2 (II)
t
landscape of LSTM training. To address this, Shuai
and Tian applied DE to refine LSTM for high-speed Where β is the attenuation factor for the second
21
2
rail passenger flow prediction, reducing error rates by moment, usually set to 0.999, and is used to estimate
2.3988% relative to baseline LSTM. Similarly, Peng the squared mean of the gradient. To avoid deviation,
et al. deployed DE-optimized LSTM for electricity Nadam corrects these estimates and introduces a
22
price forecasting, outperforming competing models in modified form of Nesterov momentum:
multi-task scenarios. Although DE excels in global search ˆ
and local optimization, it is less effective than gradient t m = β 1 m + t (1 β − 1 ) g t (III)
optimization in local development. The final parameter update formula is:
Compared to the Adam optimizer, Nadam incorporates
Nesterov momentum, enabling anticipation of the ˆ ) g t
gradient direction during parameter updates to accelerate β 1 t m + (1 β − 1 1 β t
−
convergence and reduce oscillation. In addition, Nadam is θ t+ 1 θ = t η −⋅ ˆ 1 (IV)
more stable during training, especially when dealing with t v + ε
sparse gradients and dynamic data. However, no prior Where η denotes the learning rate, ϵ represents
work has combined Nadam with DE to synergize global a numerical stabilizer to avoid zero-division errors
exploration and local exploitation, thereby enhancing ˆ t
−8
model optimization. To bridge this gap, this study proposes (e.g., 1 × 10 ), and v = v t ( /1 β− 2 ) is the corrected
t
Nadam–DE, a hybrid algorithm leveraging Nadam’s rapid second moment estimate. Nadam offers advantages such
local convergence and DE’s global search capabilities as fast convergence and strong adaptability, making it
for aquatic DO prediction. This framework strategically especially suitable for gradient dilute sulfur or high-
balances exploration and exploitation, improving both dimensional continuous space problems. However,
prediction accuracy and training efficiency. it relies on gradient information and is susceptible to
becoming trapped in local optima when faced with
3. Materials and methods complex loss functions.
3.1. Basic principles of Nadam and DE algorithm 3.1.2. DE
3.1.1. Nadam In contrast, the DE algorithm is a population-based,
In the field of optimization algorithms, Nadam and heuristic global optimization method that evolves
DE represent typical paradigms of gradient-based better solutions through the simulation of mutation,
23
Volume 22 Issue 5 (2025) 67 doi: 10.36922/AJWEP025210165

