Page 73 - AJWEP-22-5
P. 73

Hybrid optimization for LSTM DO prediction

                mean absolute percentage error compared to standalone   and  population-based  optimization  strategies.  Nadam
                LSTM models. Hao  developed a DO prediction method   is an improved  method  that  introduces  the  Nesterov
                                 18
                combining  the  maximum  information  coefficient  with   momentum mechanism into the Adam optimizer, aiming
                LSTM neural networks to identify the most influential   to  enhance  the  predictive  ability  and  convergence
                environmental  variables.  This  model  outperformed   efficiency  of  parameter  updates.  The  core  idea  is  to
                traditional statistical methods and other machine learning   simultaneously  utilize  first-order  and  second-order
                techniques in predictive accuracy. However, LSTM    moment estimations to smooth the gradient information
                models  often  suffer  from  slow  convergence  during   and predict the future gradient direction through
                training.  To accelerate this process, many researchers   Nesterov  momentum,  resulting  in  better  convergence
                employ the Nadam algorithm. For instance, Gandh     trends and robustness during parameter updates. 9
                et al.   designed  a  hybrid  LSTM–gated  recurrent  unit   In its parameter update mechanism, the gradient
                     19
                architecture for intelligent aquaculture water quality   is  first  subjected  to  first-order  moment  estimation
                forecasting.  Comparative  evaluations  of  optimizers   (i.e.,  momentum term)  and second-order moment
                (e.g.,  Adaptive Moment Estimation [Adam], Nadam)   estimation (i.e., variance term), calculated as follows:
                revealed enhanced prediction accuracy and computational
                efficiency across two distinct water quality datasets. Lee   m  = β  m  + (1-β ) g t               (I)
                                                                          1
                                                                      t
                                                                             t-1
                                                                                    1
                et al.  utilized Adam-optimized LSTM to impute missing   Where g  represents the current gradient and β  is the
                    20
                                                                              t
                                                                                                               1
                DO values in land-based aquaculture systems, achieving   momentum decay factor typically set to 0.9. The second
                an error margin of ~3.25%. Nevertheless, gradient-based   moment is calculated as:
                methods like stochastic gradient descent may converge to
                suboptimal local minima in the non-convex optimization   v   2 v t1  1   2  g   t 2       (II)
                                                                     t
                landscape of LSTM training.  To address this, Shuai
                and Tian   applied  DE  to  refine  LSTM  for  high-speed   Where  β  is the attenuation factor for the second
                        21
                                                                               2
                rail  passenger  flow  prediction,  reducing  error  rates  by   moment, usually set to 0.999, and is used to estimate
                2.3988%  relative  to  baseline  LSTM.  Similarly,  Peng   the squared mean of the gradient. To avoid deviation,
                et  al.   deployed  DE-optimized  LSTM  for  electricity   Nadam corrects  these estimates  and introduces  a
                     22
                price forecasting, outperforming competing models in   modified form of Nesterov momentum:
                multi-task scenarios. Although DE excels in global search   ˆ
                and local optimization, it is less effective than gradient   t m =  β  1 m +  t  (1 β  −  1 ) g t  (III)
                optimization in local development.                     The final parameter update formula is:
                  Compared to the Adam optimizer, Nadam incorporates
                Nesterov  momentum,  enabling  anticipation  of  the               ˆ        )  g t
                gradient direction during parameter updates to accelerate       β  1  t m +  (1 β  −  1  1 β t
                                                                                              −
                convergence and reduce oscillation. In addition, Nadam is   θ  t+ 1  θ =  t  η −⋅  ˆ  1          (IV)
                more stable during training, especially when dealing with               t v + ε
                sparse gradients and dynamic data. However, no prior   Where  η  denotes  the  learning  rate,  ϵ  represents
                work has combined Nadam with DE to synergize global   a  numerical  stabilizer  to  avoid  zero-division  errors
                exploration  and  local  exploitation,  thereby  enhancing              ˆ           t
                                                                                −8
                model optimization. To bridge this gap, this study proposes   (e.g.,  1  × 10 ), and  v = v t  ( /1 β−  2  )  is the corrected
                                                                                         t
                Nadam–DE, a hybrid algorithm leveraging Nadam’s rapid   second moment estimate. Nadam offers advantages such
                local  convergence  and  DE’s  global  search  capabilities   as fast convergence and strong adaptability, making it
                for aquatic DO prediction. This framework strategically   especially  suitable for gradient dilute  sulfur or high-
                balances  exploration  and  exploitation,  improving  both   dimensional  continuous  space  problems.  However,
                prediction accuracy and training efficiency.        it relies on gradient information and is susceptible to
                                                                    becoming  trapped  in  local  optima  when  faced  with
                3. Materials and methods                            complex loss functions.

                3.1. Basic principles of Nadam and DE algorithm     3.1.2. DE
                3.1.1. Nadam                                        In  contrast,  the  DE  algorithm  is  a  population-based,
                In  the  field  of  optimization  algorithms,  Nadam  and   heuristic  global  optimization  method  that  evolves
                DE   represent  typical  paradigms  of  gradient-based   better  solutions  through  the  simulation  of  mutation,
                   23


                Volume 22 Issue 5 (2025)                        67                           doi: 10.36922/AJWEP025210165
   68   69   70   71   72   73   74   75   76   77   78