Page 287 - GHES-3-3

P. 287

Global Health Economics and
Sustainability
Carbon footprint of smartphones in healthcare

Our analysis reveals patterns in how LLMs estimated 4. Discussion
carbon emissions compared to the actual values.
Across all models tested, we observed a greater trend LLMs, when trained differently, will provide answers
that are slightly different from one another. For example,
of overestimation compared to underestimation when ChatGPT is trained through a generative pre-trained
errors occurred. We found that all LLMs produced transformer model that takes data and incorporates it
carbon emission estimates that differed from the actual into its software architecture. Gemini relies on Google’s
values reported in the companies’ CSR reports or 10-K DeepMind proprietary algorithms and its neural node
statements, with error rates ranging from 0% to 61.2%. type of architecture. While both ChatGPT and Gemini use
ChatGPT-4.0 had the lowest average error rate at 1.6%, a transformer-based architecture, Gemini uses retrieval
showing a slight overestimation for Apple devices when augmented generation trained on diverse datasets with
inaccuracies appeared, particularly for the iPhone scalable infrastructure. Google’s dataset includes voice
15 (8.9% above actual) and iPhone 15 Plus (8.2% and imaging data, which differs from ChatGPT (Rane
above actual). Gemini demonstrated more substantial et al., 2024).
overestimation, especially with the iPhone 15 model,
where the reported emissions were 19.6% higher than the The ChatGPT architecture is renowned for
actual value. Gemini had an average error rate of 6.0% conversational abilities. It also has reinforcement learning
for Apple products and performed better for Samsung with human feedback and instruction tuning. Answers
provided by Gemini may be longer than those of ChatGPT,
devices with a 1.6% error rate. Claude.ai exhibited the Claude.ai, or Meta AI due to its use of a larger data set than
most significant variance in accuracy, with extensive the other LLMs (Rane et al., 2024). Gemini is the first of
overestimations for Samsung devices, with 61.2% and the LLMs to add references and any related websites from
51.8% above actual values for the Galaxy Z Flip6 and which their responses are derived (Lang et al., 2024).
Galaxy Z Fold6, respectively.
ChatGPT outscored Gemini in answering accuracy on
The three LLMs (ChatGPT, Gemini, and Claude.ai) the neurosurgery board exams, particularly in questions
gave accurate information about the Apple HomePod’s involving imaging (Sau et al., 2025). For example, in
emissions. Meta AI did not provide this information queries about retinal detachment, ChatGPT outscored
(Meta Platforms Inc., 2024). However, none of the four Gemini, possibly due to the lower reading comprehension
AI tools could find emissions data for Google Home or level of Gemini (Strzalkowski et al., 2024). ChatGPT
Echo devices. This is because Google and Amazon do consistently outscores Gemini and CoPilot in terms of
not publish emission information for the latest versions response accuracy (Marey et al., 2025).
of these products on their corporate websites or 10-K The discrepancies observed in LLM data on carbon
statements (Alphabet Inc., 2024; Amazon.com, Inc., 2024; emissions highlight several limitations inherent to AI
Apple Inc., 2024). models. One significant factor is the dependency of these
Table 4. Comparison of average percent error across models on publicly available data. If specific information is
smartphone devices not easily accessible, the models may either fail to respond
or rely on indirect inferences from their training data sets,
Smartphones ChatGPT‑4.0 Gemini Claude leading to inaccuracies. This limitation was particularly
Apple models 1.6 6 4.6 evident with Samsung devices, where many emissions data
Samsung models 1.6 1.6 27.8 points were not explicitly disclosed in corporate reports,
Note: ChatGPT is better at calculating carbon emissions compared to resulting in incomplete or inconsistent responses from
Gemini or Claude. This may be due to its data training sets. LLMs.

Table 5. Energy usage and power consumption for smartphones
Hardware component Component type/mode Settings description Average power Power budget
category consumption (mW) (% of battery)
Display Dark environment Black background (40%) (avg) power 80 1.8
Black background (100%) power 260 5.7
White (lit) environment White background (40%) (avg) power 226 5
White background (100%) power 527 11.7
Low power mode Screen showing time/date/weather 14 0.3
Touchscreen use Continuous usage (10 min) 585 13
Abbreviation: Avg: Average.
Volume 3 Issue 3 (2025) 279 https://doi.org/10.36922/ghes.8359

282 283 284 285 286 287 288 289 290 291 292