Page 162 - AJWEP-22-5
P. 162
Li and Wu
design, and development pathways, relatively few This study employed LDA topic clustering to extract
studies examine the policy texts themselves. The more detailed information, thereby enriching the content
present study addresses this gap by analyzing policy and broadening the analytical perspective. The results
documents and identifying temporal trends through were then visualized using SBC to more comprehensively
text clustering and thematic analysis. illustrate how China’s policies align with the actual
(ii) Although text mining techniques are widely applied demands of emission and carbon reduction.
in fields such as business analytics, fraud detection,
and spam filtering, their application in the analysis 3.1. LDA
of emission reduction and carbon support policies a. Principles of LDA
26
remains limited. This study employs LDA for The LDA algorithm, introduced by Blei et al. in
keyword-based text clustering and integrates SBC 2003, takes a document set D = {d , d , d ,… d } and a
2
1
n
3
for visual interpretation, thereby developing a specified number of clusters m as input. The algorithm
combined LDA–SBC analytical framework. then calculates the probability p of each document
d belonging to every topic. Every document is
27
i
In summary, this study adopts a combined LDA–SBC thus represented by a probabilistic distribution over
analytical approach for data analysis. First, the LDA multiple topics, expressed as d = (d , d ,…, d ).
i
p1
pm
p2
topic clustering model is applied to extract thematic For every word within a document, the algorithm
word distributions from policy texts. Next, the SBC also calculates the probability values associated
is used to visualize and interpret these distributions, with each topic, denoted as W = (W , W ,…, W ).
pm
p2
i
p1
thereby revealing the scope and focus of the policies. The model ultimately produces two key matrices:
Finally, this method supports a deeper understanding of one representing the distribution of topics across
enterprise development trends, industrial planning, and documents, and the other depicting the distribution
the strategic direction of national policies. of words across topics.
Therefore, the LDA algorithm maps documents
3. Research methodology and words into a series of topics and attempts to
use these topics to discover hidden relationships
China adopts a 5-year cycle for its national development between documents and comments, documents and
plans and is currently implementing the 14 5-Year other documents, and terms and phrases. LDA is an
th
Plan. For comparative analysis, this study employed unsupervised learning method that does not require
the 13 5-Year Plan as a reference point. Accordingly, each target to meet certain constraints. Instead, it
th
the period from 2016 to 2023 was examined to analyze counts the word frequency distribution in each topic
both the similarities and differences between these two after clustering and identifies the highest-frequency
planning phases. The timeframe was segmented into words in each case, from which the topic meaning
three stages: 2016 – 2018 is defined as the first phase, is inferred.
representing the early stage of the 13 5-Year Plan; 2019 b. LDA process
th
– 2021 constitutes the second phase, covering the late of Each document d in the document set D is regarded
the 13 5-Year Plan and the lead-up to the 14 ; and the as a sequence of words, and let d have n observations
th
th
2022 – 2023 represents the third phase, corresponding in the series <w , w ,…w >,where w denotes the i-th
to the early stage of the 14 5-Year Plan. word. 1 2 n i
th
The LDA technique enables effective text All the unique words present inD form a large set
classification by identifying and grouping related themes, called the “vocabulary” (VOC). For each document
while the K-means algorithm offers strong scalability d inD, the probability of belonging to different topics
and efficiency, operating within polynomial time. is represented as θ = <p ,…,p >, where p denotes
24
tk
ti
t1
d
Therefore, LDA was selected as the clustering method the probability that document d corresponds to the
for the text data. In addition, SBCs can display flow paths i-th topic in set T. This probability is calculated as:
and multiple data dimensions simultaneously within a n ti
single chart, making the visualization of complex data p = n (I)
ti
more intuitive and information-rich. In this context,
25
SBC diagrams can help decision-makers and analysts Where n denotes the number of words in d
ti
better understand relationships and trends within the data, corresponding to the i-th topic, and n is the total number
thereby supporting more informed decision-making. of all words in d.
Volume 22 Issue 5 (2025) 156 doi: 10.36922/AJWEP025160117

