Page 103 - AJWEP-22-4
P. 103

Conservation of the vulnerable Vitellaria Paradoxa

                trees [BRT], and artificial neural network [ANN]). An   To evaluate the predictive performances of the three
                RF is an ensemble method that incorporates multiple   models’ algorithms  selected  in  this  study, a  simple
                classification (decision) trees and produces predictions   holdout,  also  known as  split-sample  cross validation
                either  by averaging  (e.g., regression) or voting   method, was used by splitting the data into an 80:20
                (e.g., classification). 12,34  It demonstrates high predictive   ratio, where 80% data was used to train the model while
                performance among machine learning algorithms. 35,36  the remaining 20% was set aside to assess the model
                  A BRT, also known as generalized boosted regression   predictive  performance. 12,40   The models were also
                modeling, uses a boosting algorithm  that iteratively   replicated 10 times to obtain the best model fit based on
                applies regression-tree algorithms in a forward, stage-  their  cross-validation  correlation.  Both the  threshold-
                wise procedure to construct a combination of trees. The   independent  statistic,  “the area under the receiver
                model focuses on the weakest parts of the present model   operating  characteristic  curve (AUC ROC ),”  and the
                                                                                                           43
                by constructing a new tree based on the residual of the   threshold-dependent,  the  confusion  matrix  statistics,
                previously fitted tree. 37                          were employed to assess  the predictive  accuracy,
                  An  ANN is a complex  interconnected  system of   precision, and specificity of the models. Another model
                several nodes, mimicking the structure and functioning   evaluation  method  used  was  the  Kappa  coefficient
                                                                           44
                of neurons in the human brain. It consists of three major   statistic,  which measures the level of agreement
                components: an input/nodes (comprising the predictors/  between two raters based on binary classification.
                environmental  variables),  a hidden layer (receiving   The AUC ROC  class ranges from 0 to 1.  An  AUC
                information  with several connections  to the input   value of 0.5 or lower indicates performance no better
                through interconnected nodes), and an output layer (the   than random prediction, values between 0.7 to 0.8 are
                prediction). ANNs can produce outputs similar to those   considered tolerable,  0.8 – 0.9 are regarded as very
                of BRT and RF, in the form of regression/probability   good or excellent, and values above 0.9 are classified
                predictions  or  classification  for  binary  inputs,  though   as outstanding. 42,45  Meanwhile, the Kappa scores range
                the latter is the most widely used form. 38         from  −1  to  1,  where  0  indicates  no  agreement  and
                  The models were fitted using the following packages   random,  1 represents  perfect agreement,  and the  rare
                in R: “randomForest” for RF,  “DISMO” for BRT,  and   occasions of negative  values signify less agreement
                                                            40
                                         39
                “neuralnet” for ANN.  The binary occurrence data for   than would be expected by chance. The Kappa index is
                                   41
                V. paradoxa (presence and background) served as the   determined by Equation I:
                response variable, while selected climate and vegetation
                indices were used as predictors or explanatory variables.   −K = (p  − p )/(1 − p )                (I)
                                                                                      e
                                                                           0
                                                                               e
                For the RF model optimization, a tuning method was     Where  K is the  Kappa index,  p  is the  observed
                                                                                                     0
                adopted  based on the number of trees (ntree)  and   agreement, and p  is the expected agreement by chance.
                       42
                                                                                   e
                node size. Meanwhile, the BRT parameters were          The relative  contribution  of each predictor/
                tuned by changing the number of trees/tree complexity   environmental  variable  to the model predictions  was
                (tc) in the range of 1,000 – 5,000 for each model   also assessed, and their ecological  plausibility  was
                until  the optimum  configuration  was achieved,  while   evaluated  through extracting response curves, also
                maintaining the learning rate (lr) at 0.001. Diagnostic   known as partial dependence plots.
                plots (showing error trees and residuals) and cross-
                validation  correlations  were used to visually  assess   2.6. Climate-based provisional seed zone and seed
                and select  the optimal  model. For the  ANN  model,   zone priority location index (SZPI)
                input  data  were transformed  into  a common  scale   Based on the V. paradoxa’s predicted suitability and the
                (normalization) to ensure accurate comparison between   relative contributions of the top climate variables, SZMs
                predicted and actual values. The ANN architecture was   were constructed by spatially intersecting the values in
                configured with hidden layers set at 4 and 2, activation   their respective cells in a GIS environment (QGIS 3.40.6
                set  at  rectified  linear  unit,  and  linear  output  as  false   and ArcMap 10.8). A quantile classification (ordering)
                (regression)  or  true  (classification).  SDM  probability   method  was  then  applied  to  group  the  data  into  five
                predictions  were  generated  across the  study area.   broad, homogeneous classes, arranged  according  to
                Models with an area under the curve (AUC) >0.7 were   their relative wetness or dryness.
                included in a weighted ensemble or consensus to obtain   The  SZPI  was  defined  as  an  area  that  is  not  only
                the final SDM prediction.                           climatically suitable for the distribution and survival of




                Volume 22 Issue 4 (2025)                        95                           doi: 10.36922/AJWEP025210160
   98   99   100   101   102   103   104   105   106   107   108