Page 65 - IJOCTA-15-1
P. 65
BSO: Binary Sailfish Optimization for feature selection in sentiment analysis
where x i elite sail is the best position of elite sailfish, where l is the number of classes (there are two
x i is the current position of sardine, α is a classes in this study: 0 (negative) and 1 (pos-
old sard
random number between 0 and 1 and AS shows itive)). Feature Selection (FS) method (here,
the amount of sailfish’s Attack Strength at each BSO) extracts a subset F sub = {f sub 1 , ..., f sub n },
iteration that is produced as shown in Eq. 7: where sub n < t, F sub ⊂ F and F sub has higher
classification accuracy than any other subset of
AS = A ∗ (1 − (2 ∗ Itr ∗ ς)) (7)
same size or any valid subset of F. The only two
where A and ς are coefficients for decreasing the possible solutions to the binary optimization is-
value of power attack linearly from A to 0 and Itr sue known as FS are 0 and 1. A binary vector is
is the current iteration. Sardines move to different used to express a solution, with 1 meaning that
locations as soon as a sailfish attacks. the related word is selected and 0 meaning that
it is not. This binary vector has the same length
According to the amount of α and AP character- as sub n . BSO is a method for solving continuous
istics, sardines adjust their position to confuse the optimization problems with real-valued solutions.
predator and reduce the danger of being discov- The continuous search space of the BSO is con-
ered. As previously stated, the attack capability verted to a binary one using the sigmoid transfer
of sailfish will weaken over time. In fact, lowering function. Sigmoid function is as follows:
the sailfish’s attack strength can help the search 1
agents to converge more adaptively. The sardines’ sig(x) = −x (11)
1 + e
number that changes their position (β) and the
The output of the Sigmoid function will be used
variables’ number of them (γ) can be computed
to update the sailfish’s current position. BSO
using the AS parameter as shown in Equations 8
does not aim only to achieve higher accuracy with
and 9:
a subset of words, BSO also aims to achieve higher
β = N sard ∗ AS (8)
accuracy with as less words as possible. These
γ = d i ∗ AS (9)
properties of BSO make BSO a multi-objective
where d i is the number of variables at i th itera- problem. BSO has minimization (less words) and
tion and N sard is the number of sardines as stated maximization (higher accuracy) problem within
before. itself. These properties are directly in conflict.
Ghosh et. al. 82 created a fitness function f in or-
AS and α parameters encourage SOA to be- der to combine these two problems into one with
have more randomly throughout optimization, using classification error rate as follows:
and they’re especially useful for preventing local F sub
optima stagnation during all iterations. 23 The in- f = ω ∗ γ(F sub ) + (1 − ω) ∗ (12)
t
jured sardine that has broken away from the shoal
where F sub represents the selected word sub-
will be promptly captured in the final stage of the
set, |F sub | represents number of selected words,
hunt. Catching prey is thought to occur as the
γ(F sub ) represents classification error rate of F sub ,
sardine develops fitter than its equivalent sailfish t is the total number of words of F and ω ∈ [0, 1]
in SOA. In this case, the sailfish replaces the most represents weight. This way, the contrast of
recent position of the hunted sardine to maximize minimization and maximization problems are re-
the chances of catching new prey. The formula is solved, and problem is made into a single objec-
as follows:
tive problem.
x i = x i , iff(x i < f(x i ) (10)
sail sard sard sail
where x i shows the sardine’s current position
sard 3.4. Evaluation metrics
at i th iteration and x i shows the sailfish’s cur-
sail
th
rent position at i iteration.
In order to understand how successful a classifica-
tion problem is, the results are needed to analyze
3.3.3. Binary Sailfish Optimizer (BSO)
with various evaluation criteria.
In this section, we explain how SOA is applied in While accuracy is a frequently used in statistic
binary form for feature selection. for determining a model’s success, it is not con-
sidered adequate on its own. Precision, on the
The original vocabulary set should be F = other hand, shows how many of the values pre-
{f 1 , f 2 , ..., f t } where t is the total number of words dicted as positive are truly positive, while Recall
and the class label should be C = {c 1 , c 2 , ..., c l } is a type of metric that shows how many of the
59

