Page 60 - IJOCTA-15-1
P. 60

O. Ayana, D. F. Kanbak, M. Kaya Keles / IJOCTA, Vol.15, No.1, pp.50-70 (2025)

            dataset, which contains Amazon reviews of elec-   were collected during the period from December
            tronic devices, and k-NN, NB, and SVM were im-    5, 2020, to January 5, 2021.    This timeframe
            plemented for sentiment classification. Accord-   was selected due to the observed increase in on-
            ing to the results, when using the FOA-mRMR       line sales attributed to the COVID-19 pandemic
            feature subset selection method, this dataset’s   globally. 63,64  By focusing on this specific period,
            sentiment classification accuracy has improved    we aimed to capture and analyze user sentiments
            15–20%, reaching to 95%.                          in response to the evolving market conditions.
                                                              The dataset consists of an equal distribution of
            Mustopa et al. 56  used a NB and SVM with PSO     10000 positive and 10000 negative comments, col-
            to analyze the user reviews of the PeduliLindungi  lected from user reviews in the categories of elec-
            application based on reviews of user comments.    tronics, market products, and footwear.
            PSO –based SVM has higher accuracy as 93%
            than PSO-based NB as 69%. In this study of        Each comment c i in the dataset is associated with
            Yıldırım et al., 57  a sunflower optimization algo-  a rating r i where 1 ≤ r i ≤ 5. Comments are clas-
            rithm, which is one of the new and successful     sified into two categories based on their ratings: if
            plant intelligence-based algorithms, was used to  r i > 2, then the class label l i is assigned as ”pos-
            analyze sentiments including customer feedback    itive”; otherwise, l i is designated as ”negative.”
            and satisfaction information. According to the    Consequently, the dataset D can be represented
            experiments, the proposed sunflower optimization  as D = {{c 1 , l 1 }, . . . , {c n , l n }}, where n denotes
            method provided high performance.                 the total number of comments in D.


            Some other studies in which optimization algo-    3.2. Preprocessing
            rithms are used in SA are as follows. 58,59
                                                              In this section, we outline the preprocessing meth-
            Finally, when the aforementioned studies are ex-  ods recommended for text mining that were em-
            amined, we see that the Sailfish Optimization Al-  ployed in this study. A crucial aspect of data
            gorithm (SOA) is not used in SA as a feature se-  mining is the representation of the dataset. 65
            lector. For this reason, in order to bring a novelty  Data collected from real-world applications may
            to the literature, BSO is applied as a feature se-  contain noise that impedes algorithms from con-
            lector and SA is performed.
                                                              structing accurate models and identifying exist-
                                                              ing patterns. Noisy data not only diminishes the
            3. Material and methods                           effectiveness of algorithms but also increases com-
                                                              putation times. 66,67  This challenge can further
                                                              complicate the classification process within the
            The fundamental objective of this study is to clas-
                                                              realm of text mining.
            sify user reviews employing ML and DL method-
            ologies, while utilizing the BSO as a feature selec-
                                                              Within each document, there may be words,
            tion technique within the context of SA. In this
                                                              emojis, and other elements that are repeti-
            section, we delineate the dataset utilized, the pre-
                                                              tive and do not contribute meaningful content.
            processing methods applied, the structure of the
                                                              Consequently, we focused exclusively on the
            proposed models, and the evaluation metrics em-
            ployed to assess model performance.               words that effectively distinguish between the two
                                                              classes—positive and negative comments—in this
                                                              study. To achieve this, preprocessing techniques
            3.1. Datasets                                     are applied to the text dataset prior to the clas-
                                                              sification process. 68  This step ensures that only
            The dataset utilized in this study comprises a to-  relevant features are retained, thereby enhancing
            tal of 20000 user comments. Specifically, we gath-  the quality of the classification.
            ered 2000 comments from Trendyol,   60  a widely
            recognized online sales platform in Turkey. Sub-  In this study, all possible combinations of prepro-
            sequently, these comments were combined with      cessing methods have been systematically eval-
            an additional 18000 comments obtained from        uated. The conversion to lowercase can be ei-
            n11, 61  as part of the dataset provided by Fırat  ther activated (On) or deactivated (Off), indi-
            University, 62  Turkey. This comprehensive dataset  cating that specific words or characters may be
            facilitates a robust analysis of user sentiment in  transformed to lowercase while others retain their
            the context of online reviews. User comments      original format. Punctuation removal is similarly
                                                            54
   55   56   57   58   59   60   61   62   63   64   65