Page 193 - IJOCTA-15-1
P. 193
Key drivers of volatility in BIST100 firms using machine learning segmentation
robust and consistent dataset for our analysis, al- outliers, that could distort the results. Addition-
lowing us to draw meaningful conclusions about ally, prices were adjusted for splits and dividends
volatility patterns in the Turkish market. to provide a consistent basis for comparison over
This study consists of two steps. In the first step, time. Finally, data points were cross-referenced
the data set created for the study was grouped as with other market sources to confirm accuracy.
low-volatility and high-volatility companies using This rigorous approach to data collection and pre-
machine learning methods. In the second step, processing from TradingView ensures that our
panel regression analysis was performed to reveal analysis is based on reliable and consistent data,
the determinants of volatility. providing a solid foundation for the statistical and
machine learning techniques applied in the study.
3.1. Data collection The resulting dataset not only aids in identifying
volatility levels among the selected firms but also
The primary dataset source is TradingView, a
helps understand the impact of various financial
reputable platform for financial data and trading
ratios on these volatility measures.
insights. TradingView provides access to histori-
cal stock price data, which is crucial for calculat- With our adequately cleaned and prepared
ing the volatility measures used in this research. dataset, we can now proceed to the crucial step of
The dataset includes daily closing prices necessary measuring volatility across our sample of BIST100
for computing Parkinson volatility scores, which firms.
are the fundamental variables in our analyses.
The choice of daily closing prices aligns with es- 3.3. Volatility measurement
tablished practices in volatility research and pro-
In financial terms, volatility represents the de-
vides a balance between granularity and manage-
gree of variation in the trading prices of stocks
ability of data. The data needed for the second
over a specific period. It is a crucial measure of
stage of the study, the panel regression analysis,
risk and uncertainty in financial markets. For this
was obtained from the Finnet database.
study, we focused on calculating annual Parkin-
The reviewed literature highlights consistent de- son volatility scores based on daily trading data
terminants of stock price volatility, such as firm of firms listed on the BIST100.
size, dividend policy, leverage, and trading vol-
Parkinson Volatility Calculation: The Parkinson
ume, analyzed through methods like GARCH and
linear regression. 15,17–19 However, a gap exists volatility measure, introduced by Michael Parkin-
son in 1980, uses a stock’s highest and lowest
in applying machine learning techniques, such as
prices to provide a more accurate estimate of its
PCA and K-means clustering, to analyze volatil-
volatility compared to traditional methods that
ity in emerging markets like Turkey’s BIST100 in- 47
only use closing prices. The following formula is
dex. Additionally, few studies combine machine employed to measure volatility. In this formula,
learning clustering with panel regression to ex- (H i ) and (L i ) represent the highest and lowest
plore financial ratios’ impact on volatility within prices of the stock on the day (i), respectively,
distinct firm groups.
while (n) denotes the number of trading days in
This study addresses these gaps by integrating the year.
machine learning and econometric methods, offer-
ing a novel approach to segment firms into low-
σ {Parkinson} =
and high-volatility groups. By combining PCA
v
and K-means clustering with panel regression, we u ( 2 )
u 1 1 X n H i
provide deeper insights into how financial ratios t × log
4 log (2) n {i=1} L i
influence volatility within these groups. This con-
tributes original value by enhancing the precision (1)
of volatility analysis in emerging markets and ex-
In order to calculate the annual Parkinson volatil-
panding understanding of risk determinants in the
ity scores, each stock’s daily high and low prices
Turkish context.
were aggregated for each year from 2006 to 2023.
For each firm and each calendar year in the
3.2. Data processing
dataset, the Parkinson volatility formula was ap-
Following data collection, a series of preprocessing plied using all trading days of that year to cal-
steps were undertaken to guarantee the accuracy culate the annual volatility score. This method
and consistency of the data set for subsequent effectively captures intra-year price fluctuations,
analysis. The data underwent cleaning to elim- providing a nuanced picture of volatility. The an-
inate inconsistencies, such as missing values or nual volatility scores calculated for each firm were
187

