Page 77 - AJWEP-22-5
P. 77

Hybrid optimization for LSTM DO prediction

                  Use it  to  replace  the  worst-performing  individual   3.4.2. Construction of the loss function and fitness
                x worst  in the Nadam subpopulation, injecting the global   evaluation
                 L
                search results into the local search path.          The  hybrid  algorithm  drove  the  evolution  process  by
                  This two-way interaction  mechanism  not only     evaluating the predictive ability of each individual under
                promotes  the  flow  of  high-quality  information  in  the   the  current  LSTM  model.  Specifically,  the  individual
                understanding  space  but  also  helps  overcome  the   parameter  vector  θ  was  loaded into the network, the
                                                                                     i
                limitations  of  local  optimality,  thereby  improving   model performed forward propagation on the training
                                                                                                      ˆ
                overall search efficiency and solution accuracy.    data, calculated the prediction result  y , and used the
                (e)  Termination criteria and optimal solution output  MSE as the fitness function:    t
                  The hybrid algorithm terminated when either of the
                following conditions was met: The maximum number            1  N     ˆ    2
                                                                       θ
                of iterations T  was reached, or the fitness value of the    ( ) =  N  1    y −  y t  ∑    (XXIII)
                                                                                   t
                                                                        i
                            max
                optimal individual in the entire population fell below        t=
                the predefined threshold δ.                            Here, y  is the true label,  y  is the prediction result
                                                                                              ˆ
                  The final optimization result was recorded as:             t                 t
                                                                    of the model under the parameter θ , and N is the sample
                                                                                                   i
                x  argmin    f x()                       (XXII)    size.
                 *
                       xP L P G

                                                                    3.4.3. Embedding process of the hybrid optimization
                3.4. Application of the hybrid optimization         algorithm
                algorithm in LSTM training                          Embedding the hybrid optimization algorithm into the
                To effectively improve the generalization performance
                and convergence accuracy of deep learning models in   parameter  training of the  LSTM model,  the  training
                complex water quality data prediction tasks, this study   process  can  be  regarded  as  a  complex  black-box
                introduced  the  previously  constructed  Nadam–DE   function optimization problem. The specific process is
                hybrid optimization algorithm into the training process   as follows:
                of the  LSTM network.  Unlike the  traditional training   (a)  Initialization stage: Generate the initial population
                mechanism  based  on  back  propagation  and  a  single   {θ ,  θ ,…,θ }, which  is divided  into  the  Nadam
                                                                                  P
                                                                             2
                                                                          1
                gradient  optimizer,  the  hybrid  optimization  strategy   subpopulation  and  the  DE  subpopulation.  Local
                emphasizes a wide-area search in the global parameter   gradient updates and global mutation searches are
                space during the early stage of training and focuses on   then performed.
                gradient-driven fine development in the middle and later   (b)  Parallel optimization stage: In each generation, the
                stages.  This  approach  achieved  the  optimization  goal   Nadam  subpopulation  uses  gradient  information
                of  dynamic  collaboration  between  “exploration”  and   to  perform  first-order  and  second-order  moment
                “development” in the LSTM parameter learning process.   estimates for individuals and updates the parameters.
                                                                        Meanwhile, the DE subpopulation completes global
                3.4.1. Parameter encoding and search space definition   exploration  through  differential  variation  and
                Under this hybrid optimization framework, all trainable   crossover mechanisms.
                parameters of the LSTM model were treated as variables   (c)  Fitness  assessment:  For each  individual,  perform
                to  be  optimized.  These  parameters  were  vectorized   forward propagation in the LSTM model, calculate
                and  encoded  as  continuous,  real-valued  individuals.   the loss L (θ ), and adjust the ranking of individuals
                                                                                   i
                Assuming the model structure is fixed, the parameter    based on their fitness.
                set can be expressed as:                            (d)  IEM: At  intervals  of  T swap   generations,  individual
                                                                        replacements  based  on  the  principle  of  “superior
                θ = {W , W , b , W , W , b , W , W , b , W , W , b }
                      xi  hi  i  xf  hf  f  xc  hc  c  xo  ho  o        replacing  inferior”  are  carried  out  between
                  Here, W represents the weight matrices corresponding   subpopulations.   The   Nadam   subpopulation
                to each  gating  mechanism  (input  gate,  forget  gate,   injects  its current optimal  individual  into the
                candidate memory unit, and output gate), and b denotes   DE  subpopulation,  and  vice  versa,  promoting
                the bias term. After flattening the weights of ownership   information sharing and collaborative search.
                and the bias term into vectors, they collectively form a   (e)  Convergence determination: When the value of the
                single individual θ  ∈ R  in the hybrid algorithm, where   optimal loss function tends to stabilize or reaches the
                                    d
                                i
                d is the total dimensionality of the parameters.        maximum  iterative  algebra  G ,  the  optimization
                                                                                                   max

                Volume 22 Issue 5 (2025)                        71                           doi: 10.36922/AJWEP025210165
   72   73   74   75   76   77   78   79   80   81   82