Page 103 - AJWEP-22-4
P. 103
Conservation of the vulnerable Vitellaria Paradoxa
trees [BRT], and artificial neural network [ANN]). An To evaluate the predictive performances of the three
RF is an ensemble method that incorporates multiple models’ algorithms selected in this study, a simple
classification (decision) trees and produces predictions holdout, also known as split-sample cross validation
either by averaging (e.g., regression) or voting method, was used by splitting the data into an 80:20
(e.g., classification). 12,34 It demonstrates high predictive ratio, where 80% data was used to train the model while
performance among machine learning algorithms. 35,36 the remaining 20% was set aside to assess the model
A BRT, also known as generalized boosted regression predictive performance. 12,40 The models were also
modeling, uses a boosting algorithm that iteratively replicated 10 times to obtain the best model fit based on
applies regression-tree algorithms in a forward, stage- their cross-validation correlation. Both the threshold-
wise procedure to construct a combination of trees. The independent statistic, “the area under the receiver
model focuses on the weakest parts of the present model operating characteristic curve (AUC ROC ),” and the
43
by constructing a new tree based on the residual of the threshold-dependent, the confusion matrix statistics,
previously fitted tree. 37 were employed to assess the predictive accuracy,
An ANN is a complex interconnected system of precision, and specificity of the models. Another model
several nodes, mimicking the structure and functioning evaluation method used was the Kappa coefficient
44
of neurons in the human brain. It consists of three major statistic, which measures the level of agreement
components: an input/nodes (comprising the predictors/ between two raters based on binary classification.
environmental variables), a hidden layer (receiving The AUC ROC class ranges from 0 to 1. An AUC
information with several connections to the input value of 0.5 or lower indicates performance no better
through interconnected nodes), and an output layer (the than random prediction, values between 0.7 to 0.8 are
prediction). ANNs can produce outputs similar to those considered tolerable, 0.8 – 0.9 are regarded as very
of BRT and RF, in the form of regression/probability good or excellent, and values above 0.9 are classified
predictions or classification for binary inputs, though as outstanding. 42,45 Meanwhile, the Kappa scores range
the latter is the most widely used form. 38 from −1 to 1, where 0 indicates no agreement and
The models were fitted using the following packages random, 1 represents perfect agreement, and the rare
in R: “randomForest” for RF, “DISMO” for BRT, and occasions of negative values signify less agreement
40
39
“neuralnet” for ANN. The binary occurrence data for than would be expected by chance. The Kappa index is
41
V. paradoxa (presence and background) served as the determined by Equation I:
response variable, while selected climate and vegetation
indices were used as predictors or explanatory variables. −K = (p − p )/(1 − p ) (I)
e
0
e
For the RF model optimization, a tuning method was Where K is the Kappa index, p is the observed
0
adopted based on the number of trees (ntree) and agreement, and p is the expected agreement by chance.
42
e
node size. Meanwhile, the BRT parameters were The relative contribution of each predictor/
tuned by changing the number of trees/tree complexity environmental variable to the model predictions was
(tc) in the range of 1,000 – 5,000 for each model also assessed, and their ecological plausibility was
until the optimum configuration was achieved, while evaluated through extracting response curves, also
maintaining the learning rate (lr) at 0.001. Diagnostic known as partial dependence plots.
plots (showing error trees and residuals) and cross-
validation correlations were used to visually assess 2.6. Climate-based provisional seed zone and seed
and select the optimal model. For the ANN model, zone priority location index (SZPI)
input data were transformed into a common scale Based on the V. paradoxa’s predicted suitability and the
(normalization) to ensure accurate comparison between relative contributions of the top climate variables, SZMs
predicted and actual values. The ANN architecture was were constructed by spatially intersecting the values in
configured with hidden layers set at 4 and 2, activation their respective cells in a GIS environment (QGIS 3.40.6
set at rectified linear unit, and linear output as false and ArcMap 10.8). A quantile classification (ordering)
(regression) or true (classification). SDM probability method was then applied to group the data into five
predictions were generated across the study area. broad, homogeneous classes, arranged according to
Models with an area under the curve (AUC) >0.7 were their relative wetness or dryness.
included in a weighted ensemble or consensus to obtain The SZPI was defined as an area that is not only
the final SDM prediction. climatically suitable for the distribution and survival of
Volume 22 Issue 4 (2025) 95 doi: 10.36922/AJWEP025210160

