Page 287 - GHES-3-3
P. 287

Global Health Economics and
            Sustainability
                                                                             Carbon footprint of smartphones in healthcare


              Our analysis reveals patterns in how LLMs estimated   4. Discussion
            carbon emissions compared to the actual values.
            Across all models tested, we observed a greater trend   LLMs, when trained differently, will provide answers
                                                               that are slightly different from one another. For example,
            of  overestimation  compared  to  underestimation  when   ChatGPT is trained through a generative pre-trained
            errors  occurred.  We  found  that  all  LLMs  produced   transformer model that takes data and incorporates it
            carbon emission estimates that differed from the actual   into  its  software  architecture.  Gemini  relies  on  Google’s
            values reported in the companies’ CSR reports or 10-K   DeepMind proprietary algorithms and its neural node
            statements, with error rates ranging from 0% to 61.2%.   type of architecture. While both ChatGPT and Gemini use
            ChatGPT-4.0 had the lowest average error rate at 1.6%,   a transformer-based architecture, Gemini uses retrieval
            showing a slight overestimation for Apple devices when   augmented generation trained on diverse datasets with
            inaccuracies appeared, particularly for the iPhone   scalable  infrastructure.  Google’s  dataset  includes  voice
            15  (8.9% above  actual)  and  iPhone 15  Plus  (8.2%   and imaging data, which differs from ChatGPT (Rane
            above  actual).  Gemini  demonstrated  more  substantial   et al., 2024).
            overestimation, especially with the iPhone 15 model,
            where the reported emissions were 19.6% higher than the   The  ChatGPT  architecture  is  renowned  for
            actual value. Gemini had an average error rate of 6.0%   conversational abilities. It also has reinforcement learning
            for Apple products and performed better for Samsung   with human feedback and instruction tuning. Answers
                                                               provided by Gemini may be longer than those of ChatGPT,
            devices with a 1.6% error rate. Claude.ai exhibited the   Claude.ai, or Meta AI due to its use of a larger data set than
            most significant variance in accuracy, with extensive   the other LLMs (Rane et al., 2024). Gemini is the first of
            overestimations for Samsung devices, with 61.2% and   the LLMs to add references and any related websites from
            51.8% above actual values for the Galaxy Z Flip6 and   which their responses are derived (Lang  et al., 2024).
            Galaxy Z Fold6, respectively.
                                                               ChatGPT outscored Gemini in answering accuracy on
              The  three  LLMs  (ChatGPT,  Gemini,  and Claude.ai)   the neurosurgery board exams, particularly in questions
            gave accurate information about the Apple HomePod’s   involving imaging (Sau  et al., 2025). For example, in
            emissions. Meta AI did not provide this information   queries  about  retinal  detachment,  ChatGPT  outscored
            (Meta Platforms Inc., 2024). However, none of the four   Gemini, possibly due to the lower reading comprehension
            AI tools could find emissions data for Google Home or   level  of  Gemini  (Strzalkowski  et  al.,  2024).  ChatGPT
            Echo devices. This is because Google and Amazon do   consistently  outscores  Gemini  and  CoPilot  in  terms  of
            not publish emission information for the latest versions   response accuracy (Marey et al., 2025).
            of these products on their corporate websites or 10-K   The discrepancies observed in LLM  data on carbon
            statements (Alphabet Inc., 2024; Amazon.com, Inc., 2024;   emissions highlight several  limitations  inherent  to AI
            Apple Inc., 2024).                                 models. One significant factor is the dependency of these
            Table 4. Comparison of average percent error across   models on publicly available data. If specific information is
            smartphone devices                                 not easily accessible, the models may either fail to respond
                                                               or rely on indirect inferences from their training data sets,
            Smartphones     ChatGPT‑4.0   Gemini     Claude    leading to inaccuracies. This limitation was particularly
            Apple models       1.6           6        4.6      evident with Samsung devices, where many emissions data
            Samsung models     1.6          1.6       27.8     points were not explicitly disclosed in corporate reports,
            Note: ChatGPT is better at calculating carbon emissions compared to   resulting in incomplete or inconsistent responses from
            Gemini or Claude. This may be due to its data training sets.  LLMs.

            Table 5. Energy usage and power consumption for smartphones
            Hardware component   Component type/mode     Settings description     Average power    Power budget
            category                                                             consumption (mW)  (% of battery)
            Display            Dark environment    Black background (40%) (avg) power  80              1.8
                                                   Black background (100%) power      260              5.7
                               White (lit) environment  White background (40%) (avg) power  226        5
                                                   White background (100%) power      527             11.7
                               Low power mode      Screen showing time/date/weather    14              0.3
                               Touchscreen use     Continuous usage (10 min)          585              13
            Abbreviation: Avg: Average.
            Volume 3 Issue 3 (2025)                        279                       https://doi.org/10.36922/ghes.8359
   282   283   284   285   286   287   288   289   290   291   292