Page 66 - IJOCTA-15-1
P. 66

O. Ayana, D. F. Kanbak, M. Kaya Keles / IJOCTA, Vol.15, No.1, pp.50-70 (2025)

                                         Table 2. The evaluation criteria’s formulas

              Evaluation Criteria Formula of the Criteria
                  Accuracy        (True Positive + True Negative) / (Positive + Negative)
                   Precision      True Positive / (True Positive + False Positive)
                    Recall        True Positive / Positive
                   F-score        (2 * True Positive) / ((2 * True Positive) + False Positive + False Negative)

            transactions’ users need to predict positively. F-  For the BiLSTM model, we used the embedding
            score comes into play in balancing precision and  layer for the inputs. The embedding layer allows
            recall. The F-score is the harmonic mean of pre-  users to convert the words into a fixed-length vec-
            cision and recall. For this reason, we make a per-  tor, which learns the proximity of the words ac-
            formance measurement by taking the F-score into   cording to their position in the sentence and ac-
            account when evaluating. The formulas of these    cording to the degree of proximity. The embed-
            evaluation criteria are shown in Table 2.         ding layer takes 3 basic inputs: vocabulary size
                                                              (the first top-k unique words in the dataset), the
                                                              embedding dimension (the length of the vector),
            4. Results and discussion                         and the maximum length which represents the
                                                              number of words used for a sentence/comment.
            In this section, we define the experimental sce-  The problem is that each sentence may not con-
            narios and discuss the results obtained. We con-  tain as many words as the maximum length. The
            duct tests using four different ML algorithms, and  example of this situation is shown in Table 3 (all
            additionally, one DL model is proposed and pre-   comments in the dataset may be with different
            sented for comparative analysis with the ML al-   lengths).
            gorithms. Furthermore, we apply the BSO for
            SA for the first time, utilizing the ML algorithm         87
                                                              Padding   was employed to standardize each sen-
            that yields the best performance. The BSO is
                                                              tence to a fixed length. In the proposed archi-
            compared with Harmony Search (HS), Bat Al-        tecture, two BiLSTM layers were implemented,
            gorithm (BA), Atom Search Optimization (ASO)
                                                              with the unit kernel set as a parameter. Follow-
            and Whale Optimization algorithm (WOA) which
                                                              ing these layers, a dropout layer was utilized to
            have been previously presented in the literature
                                                              mitigate the risk of overfitting. The classification
            and applied in the context of text mining. For all
                                                              process culminated in dense layers. The Adam
            experiments conducted in this section, the dataset
                                                              optimizer was selected, and binary cross-entropy
            is divided into two subsets: 70% for training and
                                                              was employed as the loss function. The structure
            30% for testing, with scenarios executed on these
                                                              of the BiLSTM model is illustrated in Figure 2.
            datasets.                                                                                  88
                                                              Additionally, the GridSearch (GS) method   was
                                                              utilized to accurately identify the model’s input
            4.1. Parameters of the DL model                   parameters. The search space for each parame-
                                                              ter is detailed in Table 4, with the optimal values
                                                              determined by GS discussed in Section 4.2.3.
            In addition to the ML algorithms, we propose the
            use of one DL model that has garnered significant
            attention recently and has demonstrated effective
            performance in similar studies. Specifically, we
            have chosen the Bidirectional Long Short-Term
                                                              4.2. Experiments of ML and DL
            Memory (BiLSTM) model for this research. This
                                                                   algorithms
            choice is motivated by the observation that many
            user comments, while often starting positively,
            may conclude with a negative sentiment (or vice   In this section, we evaluate the performance of
            versa).  Examples of such comments are illus-     four machine learning algorithms: Multinomial
            trated in Table 3. This variability can complicate  Na¨ıve Bayes (MNB), Support Vector Machine
            the algorithms’ ability to accurately classify com-  (SVM), K-Nearest Neighbors (KNN), and Ran-
            ments. The BiLSTM model is particularly suited    dom Forest (RF). Each of these algorithms is
            for this task, as it excels in learning sequential  tested using the various preprocessing combina-
            patterns inherent in text and possesses the capa-  tions outlined in Table 1. Each combination C j
            bility for bidirectional learning. 83–86          is represented by a code where 1 ≤ j ≤ 16, and
                                                            60
   61   62   63   64   65   66   67   68   69   70   71