Pilot design before the headline
We start with 3,000 labeled chat episodes.
The pilot draws 3,000 chat episodes from WildChat-4.8M, a public dataset of real user-chatbot interactions. The unit of analysis is the 3,000-row pilot draw. Each row is counted once, classified into the task it performs, and assigned exactly one lowest-compute route that could satisfy the request.
Data And Labels
Each chat is converted into a routing decision
For each row, Gemini labels the user's intent, observed route, lowest sufficient route, and energy inputs such as visible tokens, model responses, reasoning, search, and tool calls. The route labels below are mutually exclusive and collectively exhaustive across the 3,000 rows.
WildChat user prompt, assistant answer, model name, timestamp, language, and turn count.
Factual lookup, writing, coding, calculation, local software, reasoning, tool workflow, or not comparable.
Local tool, direct search, search plus reading, small model, standard LLM, reasoning/agent, or expert.
Replay the same task under GPT-5.5 central/heavy, then compare it with the assigned route directly.
Core Message
About half of GPT-5.5 chat energy is a routing choice.
In this 3,000-row pilot, the MECE route assignment cuts GPT-5.5 central cloud energy from 11,410.3Wh to 5,668.3Wh, a 50.3% reduction before any change to model architecture. The largest opportunity is not Google Search alone; it is routing small, simple, and software-native tasks away from high-compute LLM paths.
Energy Multipliers
Use one additive route model before comparing alternatives
Every pilot row is decomposed as base visible inference, multiple model responses, reasoning add-on, search add-on, and tool add-on. Then each task is compared with its assigned MECE route: local software, direct search, search plus reading, small model, standard LLM, reasoning/agent, or expert.
11,410.3Wh GPT-5.5 replay minus 5,668.3Wh under direct MECE routing.
Uses EPA 0.394 kgCO2/kWh average electricity factor.
Heavy GPT-5.5 active-compute sensitivity.
Same route labels, heavier GPT-5.5 path.
Interpretation
Search is one route, not the whole story.
Averages differ because the conversations differ in length, turn count, and model route. Under GPT-5.5 central, direct-search cases average 1.74Wh on the LLM path and 0.405Wh on the search path. Small-model cases are the largest gap: 3.11Wh on GPT-5.5 versus 0.076Wh on the assigned small-model route.
Scale-Up
Small savings per chat become TWh-scale at global AI volume
The pilot saves 1.91 to 3.28 Wh per row under GPT-5.5 central and heavy assumptions. Scaling that routing intensity shows what the opportunity looks like at platform and global volume.
Approximate 2026 world population.
OpenAI public usage anchor.
GPT-5.5 central to GPT-5.5 heavy range.
EPA U.S. average electricity factor.
Research Rule
Classify the task first; compare energy second; keep human time as a separate axis.
Human time is not converted into carbon. This page computes cloud energy by direct MECE route assignment; the next research layer should plot the time-carbon frontier rather than collapse time into emissions.
Method
The route model is additive
GPT-5.5 central uses a 0.85 Wh base response-equivalent and a 6.5 Wh reasoning total. Search adds 0.30 Wh per query. Carbon is cloud electricity multiplied by the EPA U.S. average grid factor.
11,410.3Wh across 3,000 pilot rows.
5,668.3Wh after assigning each row to its MECE route.
Pilot-scale value; platform-scale value is shown in the scale-up section.
Appendix
Data coverage and GPT-5.5 coefficient derivation
The main story uses a GPT-5.5 replay. The details below show where the pilot conversations came from and how the GPT-5.5 energy assumptions are anchored.
Public standard-query anchor from Epoch/OpenAI-era estimates.
0.34Wh × 2.5 active-compute multiplier.
Central estimate when GPT-5.5 reasoning is invoked.
Heavy standard base / reasoning total for Pro-style paths.
Parameter Proxy
Define GPT-5.5 by active compute.
GPT-5.5 is treated as a larger product surface than GPT-4o: 1,050,000-token context, 128,000 max output, reasoning-token support, and $5/$30 per million input/output tokens. The replay model uses active compute, token length, reasoning steps, retrieval, and tool loops.
Sources
Sources and anchors for the calculation
- de Vries, Joule 2023: 0.3 Wh search and up to 2.9 Wh LLM interaction
- Epoch AI: about 0.3 Wh for a typical GPT-4o-style query
- Oviedo et al., Joule 2026: 0.31 Wh frontier inference and order-of-magnitude higher long reasoning
- Oviedo et al. preprint: 0.34 Wh standard and 4.32 Wh test-time scaling scenario
- EPA eGRID: 0.394 kgCO2/kWh U.S. average electricity factor
- OpenAI API model page: GPT-5.5 pricing, context, and reasoning support
- OpenAI GPT-5.5 release: more capable and fewer tokens on Codex tasks
- Worldometer / UN WPP 2024: 2026 world population around 8.3B
- OpenAI: ChatGPT serves more than 800M weekly users
- WildChat-4.8M