Page 162 - AJWEP-22-5
P. 162

Li and Wu

                   design, and development pathways, relatively few    This study employed LDA topic clustering to extract
                   studies  examine  the  policy  texts  themselves.  The   more detailed information, thereby enriching the content
                   present study addresses this gap by analyzing policy   and broadening the analytical perspective. The results
                   documents and identifying temporal trends through   were then visualized using SBC to more comprehensively
                   text clustering and thematic analysis.           illustrate  how  China’s  policies  align  with  the actual
                (ii)  Although text mining techniques are widely applied   demands of emission and carbon reduction.
                   in fields such as business analytics, fraud detection,
                   and spam filtering, their application in the analysis   3.1. LDA
                   of emission reduction and carbon support policies   a.  Principles of LDA
                                                                                                                 26
                   remains limited.  This study employs LDA for         The LDA algorithm, introduced by Blei et al.  in
                   keyword-based text clustering and integrates SBC     2003, takes a document set D = {d , d , d ,… d } and a
                                                                                                       2
                                                                                                    1
                                                                                                              n
                                                                                                          3
                   for  visual  interpretation,  thereby  developing  a   specified number of clusters m as input. The algorithm
                   combined LDA–SBC analytical framework.               then calculates the probability p of each document
                                                                        d  belonging  to every topic.  Every  document  is
                                                                                                 27
                                                                         i
                  In summary, this study adopts a combined LDA–SBC      thus represented by a probabilistic distribution over
                analytical  approach for data analysis. First, the LDA   multiple topics, expressed as d = (d , d ,…, d ).
                                                                                                    i
                                                                                                        p1
                                                                                                                  pm
                                                                                                            p2
                topic  clustering model  is applied  to extract  thematic   For every word within a document, the algorithm
                word distributions from policy  texts. Next, the SBC    also  calculates the  probability  values  associated
                is  used  to  visualize  and  interpret  these  distributions,   with each topic, denoted as W = (W , W ,…, W ).
                                                                                                                  pm
                                                                                                           p2
                                                                                                  i
                                                                                                       p1
                thereby revealing the scope and focus of the policies.   The model ultimately produces two key matrices:
                Finally, this method supports a deeper understanding of   one representing the distribution of topics across
                enterprise development trends, industrial planning, and   documents, and the other depicting the distribution
                the strategic direction of national policies.           of words across topics.
                                                                        Therefore,  the LDA algorithm  maps documents
                3. Research methodology                                 and words into a series of topics and attempts to
                                                                        use these topics to discover hidden relationships
                China adopts a 5-year cycle for its national development   between documents and comments, documents and
                plans  and  is currently  implementing  the  14   5-Year   other documents, and terms and phrases. LDA is an
                                                         th
                Plan. For comparative  analysis, this study employed    unsupervised learning method that does not require
                the 13  5-Year Plan as a reference point. Accordingly,   each target to meet certain constraints. Instead, it
                     th
                the period from 2016 to 2023 was examined to analyze    counts the word frequency distribution in each topic
                both the similarities and differences between these two   after clustering and identifies the highest-frequency
                planning  phases.  The  timeframe  was segmented  into   words in each case, from which the topic meaning
                three stages: 2016 – 2018 is defined as the first phase,   is inferred.
                representing the early stage of the 13  5-Year Plan; 2019   b. LDA process
                                                th
                – 2021 constitutes the second phase, covering the late of      Each document d in the document set D is regarded
                the 13  5-Year Plan and the lead-up to the 14 ; and the   as a sequence of words, and let d have n observations
                     th
                                                        th
                2022 – 2023 represents the third phase, corresponding   in the series <w , w ,…w >,where w denotes the i-th
                to the early stage of the 14  5-Year Plan.              word.        1   2   n         i
                                       th
                  The  LDA  technique  enables  effective  text         All the unique words present inD form a large set
                classification by identifying and grouping related themes,   called the “vocabulary” (VOC). For each document
                while  the  K-means  algorithm  offers  strong  scalability   d inD, the probability of belonging to different topics
                and  efficiency,  operating  within  polynomial  time.    is represented as θ  = <p ,…,p >, where p  denotes
                                                               24
                                                                                                   tk
                                                                                                             ti
                                                                                             t1
                                                                                        d
                Therefore, LDA was selected as the clustering method    the probability that document d corresponds to the
                for the text data. In addition, SBCs can display flow paths   i-th topic in set T. This probability is calculated as:
                and multiple data dimensions simultaneously within a     n ti
                single chart, making the visualization of complex data   p =  n                                    (I)
                                                                      ti
                more intuitive and information-rich.  In this context,
                                                25
                SBC  diagrams can help decision-makers and analysts    Where  n  denotes  the  number  of words in  d
                                                                               ti
                better understand relationships and trends within the data,   corresponding to the i-th topic, and n is the total number
                thereby supporting more informed decision-making.   of all words in d.
                Volume 22 Issue 5 (2025)                       156                           doi: 10.36922/AJWEP025160117
   157   158   159   160   161   162   163   164   165   166   167