A peer-reviewed study published in the energy journal Joule indicates that large-scale artificial intelligence (AI) operations are significantly more resource-efficient than historical media and academic estimates suggest.
The research, conducted jointly by the Microsoft AI for Good Lab, Microsoft Sustainability, and Azure, analysed the per-user energy and water impacts of processing large language model (LLM) queries within hyperscale datacenters. The findings suggest that the environmental footprint of a standard AI interaction is comparable to minor everyday household electricity use.
The study focused on the “inference” phase of AI operations—the process where specialized data centre hardware reads a user’s text prompt and generates a response piece by piece in tokens. According to the data, a typical conversational query submitted to some of the market’s largest and most capable LLMs consumes between 0.16 and 0.60 watt-hours (Wh) of electricity.
To provide consumer context, the researchers noted that this energy expenditure is equivalent to running a standard 40-watt personal computer for 15 to 60 seconds, or operating a 1,000-watt home microwave for 0.6 to 2 seconds. This represents a four-fold to twenty-fold reduction in energy consumption compared to previous external measurements, a variance the study attributes to earlier models failing to account for operational efficiencies achieved at scale.
The analysis also quantified the volume of cooling water consumed during a single query session. Under conservative baseline assumptions for large production models, a single query uses between 0.0 and 0.067 millilitres (mL) of water. The median water utilization was calculated to be less than a single drop, or approximately one-hundredth of a teaspoon. Microsoft stated that this volume is projected to decrease further as the company accelerates the rollout of its zero-water datacenter designs.
The paper highlights that infrastructure scale functions as a primary driver of resource conservation. Similar to a major commercial airline optimising passenger loads across thousands of daily flights to reduce per-seat fuel burn, a cloud hyperscaler processing billions of simultaneous requests can dynamically distribute computational workloads.
At an operating volume of one billion conversational queries per day, a baseline system requires roughly 0.7 gigawatt-hours (GWh) of electricity. However, the study demonstrates that when advanced system architecture optimizations are applied, aggregate power consumption drops by more than half to 0.3 GWh. This efficiency dividend remains intact even when 10 per cent of the workload is shifted to complex, token-heavy tasks such as advanced code generation or multi-step logical reasoning.
Microsoft outlined three core engineering fields where targeted investments are expected to yield a combined near-term energy reduction of 8x to 20x per query:
- Model customisation and routing: Utilizing smaller, highly specialized models—such as Microsoft’s Phi series—to match the performance of larger models at a fraction of the energy cost. Automated routing systems within the Azure AI Foundry are designed to automatically direct basic requests to lightweight models, reserving complex computing clusters for intensive tasks.
- Orchestration and serving: Implementing disaggregated serving techniques within datacenters to manage how queries are batched and processed. These internal software modifications can reduce energy demands up to five-fold for long-form queries.
- Next-generation hardware: Deploying advanced processing units that offer superior computation-per-watt ratios. The integration of custom-built inference silicon, such as the Microsoft Maia 200 chip, is projected to deliver a 1.5x to 2.5x reduction in energy requirements per query compared to legacy chip architectures.
The authors concluded that the data demonstrates that scaling global access to AI does not necessitate a linear, proportional increase in strain on local electrical grids or regional water supplies.