Page 163 - AJWEP-22-5
P. 163
Emission and carbon policy analysis
For each topic t in the set T of topics, the probability b. SBC process
of generating different words is represented by the vector The SBC consists of a Sankey chart and a bubble
φ = < p ,…,p >, where p denotes the likelihood chart, which together can more effectively and
wi
t
wm
w1
of topic t generating the i-th word in the VOC. This comprehensively display the multi-dimensional
probability is calculated as: information within the data. The Sankey diagram
shows the relationships between different pathways
N
p = N wi (II) and genes – each pathway represents the gene
wi
distribution within that pathway. In the bubble chart,
the position of each bubble indicates the GeneRatio,
Here, N denotes the number of times the i-th word the size represents the number of genes, and the
wi
in VOC is assigned to topic t, and N represents the total color reflects the p-value. The Sankey diagram and
number of words associated with topic t. bubble chart are connected by multiple signaling
The core formula of LDA is given by: pathways. In the Sankey diagram, different or
p(w|d) = p(w|t) ×p(t|d) (III) identical genes contained within each pathway are
shown, while in the bubble chart, the position of
Using the topic as an intermediate layer, the current each bubble indicates the negative logarithmic of
values of θ and φt enable the calculation of the the p-value or Q-value of a pathway, the bubble size
d
probability of occurrence of a word w in document d, corresponds to the number of genes enriched in the
where p(t|d) is computed using θ , and p(w|t) is derived pathway, and the color reflects the pathway’s Hit
d
from φt. In addition, with the current values of θ and Ratio, among other attributes. 30
d
φt, it is possible to calculate p(w|d) by considering all
possible topics that a word in a document might belong 4. Empirical analyses
to. Based on this result, the topic assignment for the
word is updated. If the topic assignment changes, it will, 4.1. Data acquisition and processing
in turn, affect both θ and φt. LDA initializes θ and φt The study employed custom-built web crawlers in
d
d
with random values and iteratively updates them until a Python 3.7 (Python Software Foundation, USA)
the model converges. 28 environment to collect emission and carbon reduction
policy documents from the websites of the National
3.2. SBC Development and Reform Commission, the Ministry
a. Principle of Sankey-bubble diagram of Agriculture and Rural Affairs, the National Bureau
The Sankey diagram, also known as the Sankey of Statistics, the National Energy Administration, the
energy flow diagram or Sankey energy balance Ministry of Science and Technology, and the Ministry
diagram, was first introduced in 1898 by Matthew of Ecology and Environment, covering the period from
Henry Phineas Riall Sankey. He created a now- 2016 to 2023. These documents served as samples for
famous diagram called the “energy efficiency of analyzing relevant policies and regulations.
the steam engine,” and since then, the visualization Considering the diverse and interconnected
has been named after the “Sankey diagram.” functions of Chinese government departments, this
Sankey-bubble diagrams combine the features of study collected carbon emission reduction data from
a bubble chart. Bubble charts can be used to show 2016 to 2023 via official government websites, enabling
correlations between three measures or to present a comprehensive analysis of emission reduction efforts
three-dimensional data simultaneously. at the departmental level.
The SBC integrates a Sankey diagram with a Figure 1 illustrates the changing pattern in the
bubble chart to more effectively visualize complex, number of policy documents issued between 2016 and
multi-dimensional data. These two charts are 2023. Although there are fluctuations in issuance across
linked through the names of pathways. The Sankey various departments, the overall trajectory suggests
diagram displays the genes associated with each an upward trend, with 70 documents released in 2023,
pathway, while the bubble chart represents multiple reflecting China’s increasing focus on emission and
attributes: the position of each bubble corresponds carbon reduction as a key aspect of national development.
to the GeneRatio, the size of the bubble indicates In terms of the total number of documents, the peak
the number of genes enriched in the pathway, and occurred in 2021, with 115 documents published. During
the color represents the pathway’s p-value. 29 that year, the administrative arm of the State Council, the
Volume 22 Issue 5 (2025) 157 doi: 10.36922/AJWEP025160117

