Page 60 - IJOCTA-15-1
P. 60
O. Ayana, D. F. Kanbak, M. Kaya Keles / IJOCTA, Vol.15, No.1, pp.50-70 (2025)
dataset, which contains Amazon reviews of elec- were collected during the period from December
tronic devices, and k-NN, NB, and SVM were im- 5, 2020, to January 5, 2021. This timeframe
plemented for sentiment classification. Accord- was selected due to the observed increase in on-
ing to the results, when using the FOA-mRMR line sales attributed to the COVID-19 pandemic
feature subset selection method, this dataset’s globally. 63,64 By focusing on this specific period,
sentiment classification accuracy has improved we aimed to capture and analyze user sentiments
15–20%, reaching to 95%. in response to the evolving market conditions.
The dataset consists of an equal distribution of
Mustopa et al. 56 used a NB and SVM with PSO 10000 positive and 10000 negative comments, col-
to analyze the user reviews of the PeduliLindungi lected from user reviews in the categories of elec-
application based on reviews of user comments. tronics, market products, and footwear.
PSO –based SVM has higher accuracy as 93%
than PSO-based NB as 69%. In this study of Each comment c i in the dataset is associated with
Yıldırım et al., 57 a sunflower optimization algo- a rating r i where 1 ≤ r i ≤ 5. Comments are clas-
rithm, which is one of the new and successful sified into two categories based on their ratings: if
plant intelligence-based algorithms, was used to r i > 2, then the class label l i is assigned as ”pos-
analyze sentiments including customer feedback itive”; otherwise, l i is designated as ”negative.”
and satisfaction information. According to the Consequently, the dataset D can be represented
experiments, the proposed sunflower optimization as D = {{c 1 , l 1 }, . . . , {c n , l n }}, where n denotes
method provided high performance. the total number of comments in D.
Some other studies in which optimization algo- 3.2. Preprocessing
rithms are used in SA are as follows. 58,59
In this section, we outline the preprocessing meth-
Finally, when the aforementioned studies are ex- ods recommended for text mining that were em-
amined, we see that the Sailfish Optimization Al- ployed in this study. A crucial aspect of data
gorithm (SOA) is not used in SA as a feature se- mining is the representation of the dataset. 65
lector. For this reason, in order to bring a novelty Data collected from real-world applications may
to the literature, BSO is applied as a feature se- contain noise that impedes algorithms from con-
lector and SA is performed.
structing accurate models and identifying exist-
ing patterns. Noisy data not only diminishes the
3. Material and methods effectiveness of algorithms but also increases com-
putation times. 66,67 This challenge can further
complicate the classification process within the
The fundamental objective of this study is to clas-
realm of text mining.
sify user reviews employing ML and DL method-
ologies, while utilizing the BSO as a feature selec-
Within each document, there may be words,
tion technique within the context of SA. In this
emojis, and other elements that are repeti-
section, we delineate the dataset utilized, the pre-
tive and do not contribute meaningful content.
processing methods applied, the structure of the
Consequently, we focused exclusively on the
proposed models, and the evaluation metrics em-
ployed to assess model performance. words that effectively distinguish between the two
classes—positive and negative comments—in this
study. To achieve this, preprocessing techniques
3.1. Datasets are applied to the text dataset prior to the clas-
sification process. 68 This step ensures that only
The dataset utilized in this study comprises a to- relevant features are retained, thereby enhancing
tal of 20000 user comments. Specifically, we gath- the quality of the classification.
ered 2000 comments from Trendyol, 60 a widely
recognized online sales platform in Turkey. Sub- In this study, all possible combinations of prepro-
sequently, these comments were combined with cessing methods have been systematically eval-
an additional 18000 comments obtained from uated. The conversion to lowercase can be ei-
n11, 61 as part of the dataset provided by Fırat ther activated (On) or deactivated (Off), indi-
University, 62 Turkey. This comprehensive dataset cating that specific words or characters may be
facilitates a robust analysis of user sentiment in transformed to lowercase while others retain their
the context of online reviews. User comments original format. Punctuation removal is similarly
54

