Page 62 - IJOCTA-15-1
P. 62

O. Ayana, D. F. Kanbak, M. Kaya Keles / IJOCTA, Vol.15, No.1, pp.50-70 (2025)

                                         Table 1. Coding of preprocessing methods

            Scenario  Code  Lowercase Conversion Off (0)/ Punctuation Removal Off (0)/ Stemming Off (0) / Stopwords Removal Off (0) /
            Number        Lowercase Conversion On (1)  Punctuation Removal On (1)  Stemming On (1)  Stopwords Removal On (1)
               1    0000            0                      0                   0                  0
               2    0001            0                      0                   0                  1
               3    0010            0                      0                   1                  0
               4    0011            0                      0                   1                  1
               5    0100            0                      1                   0                  0
               6    0101            0                      1                   0                  1
               7    0110            0                      1                   1                  0
               8    0111            0                      1                   1                  1
               9    1000            1                      0                   0                  0
               10   1001            1                      0                   0                  1
               11   1010            1                      0                   1                  0
               12   1011            1                      0                   1                  1
               13   1100            1                      1                   0                  0
               14   1101            1                      1                   0                  1
               15   1110            1                      1                   1                  0
               16   1111            1                      1                   1                  1













































                                       Figure 1. The structure of the proposed model

            Document Frequency (IDF) of the term t. An ex-    subsequently processed by the ML algorithms for
            planation of the TF and IDF values can be found   SA.
            in Section 3.3.1. The rationale for selecting only
            k words is that not all words in V are suitable for
            classification, and some may have very low fre-   In the scenario involving the DL model, each com-
            quencies. The results of the experimental scenar-  ment c is represented as a fixed-length k vector
                                                                       + k
            ios conducted to determine the optimal number     W c = {Z } , where k corresponds to the top-k
            of k words are presented in Section 4.2.1. Finally,  words in the vocabulary V . In this context, the
                                                                                   t
            the comments converted into TF-IDF vectors are    value of each word W is defined by its position
                                                                                   c
            divided into training and testing datasets and are  in the vocabulary V . The optimal k value for this
                                                            56
   57   58   59   60   61   62   63   64   65   66   67