AI’s Energy Dilemma: Weighing the Environmental Cost of LLMs Against SLMs
top of page

AI’s Energy Dilemma: Weighing the Environmental Cost of LLMs Against SLMs

ree


As artificial intelligence becomes deeply embedded in modern business operations, the demand for ever-greater computing power has accelerated technological innovation, infrastructure expansion, and skill evolution at an unprecedented pace.


Large language models from companies such as OpenAI, Anthropic, and Google have captured global attention with their remarkable ability to process and generate natural language. They are powering everything from enterprise-grade chatbots to advanced analytics platforms, often demonstrating a fluency that feels almost encyclopedic.


Yet these models come with a significant appetite for resources, particularly energy and water. A single ChatGPT query can use up to ten times more electricity than a traditional Google search. The data centres responsible for training such models can consume millions of gallons of water for cooling purposes.


The scale of this resource use is startling. Training GPT-3 required an estimated 1,287 megawatt-hours of electricity, which is roughly equivalent to powering 120 average U.S. homes for an entire year. Microsoft’s water consumption, for example, increased by 34 per cent in 2022, a jump primarily linked to AI operations.


In response, a new generation of smaller language models is emerging. These small language models are capable of delivering many of the same benefits as their larger counterparts but with a far leaner environmental footprint.


What Are SLMs?


While large models may contain hundreds of billions or even trillions of parameters, small language models typically operate with a few million up to about ten billion parameters. This reduction translates to lower demands for memory, processing power, and storage. Technically, SLMs use the same transformer architecture as LLMs. Still, techniques such as knowledge distillation, pruning, and quantisation allow them to maintain high task-specific performance at a fraction of the resource cost.


By training on domain-specific datasets, SLMs can excel at targeted tasks such as summarising company emails or resolving call centre queries, rather than attempting the broad, general-purpose mastery of LLMs.


What Are LLMs?


Large language models like GPT-4 and Gemini are vast neural networks trained on massive datasets covering a significant portion of the world’s digitised text. They can have trillions of learnable parameters and demonstrate proficiency in language, reasoning, summarisation, coding, and more. Their strength lies in adaptability. One model can be applied to a wide range of tasks, from legal contract analysis to creative writing.


However, this scale comes at a cost. LLMs require immense computational power, specialised hardware such as GPUs and TPUs, and constant internet connectivity. This increases operating expenses and contributes to a sizeable carbon footprint.


Why SLMs Are Considered More Sustainable


The smaller energy footprint of SLMs is one of their most appealing advantages. Because they are compact, both training and running them require far less power. This allows organisations to scale their AI usage without exceeding emissions targets. Many SLMs can run directly on edge devices or modest on-premises servers, reducing dependence on large, energy-intensive data centres.


They align naturally with the principles of Green AI, which prioritise efficiency, environmental responsibility, and accessibility. Their smaller size also makes them more cost-efficient, particularly for companies looking to democratise AI access or manage it at scale without prohibitive cloud costs.


Another significant benefit is their transparency. Smaller models are easier to audit, explain, and debug, which is especially valuable in heavily regulated industries such as healthcare and banking, where clear explanations of AI decisions are a legal necessity.


SLMs offer flexibility, too. They can be deployed in the cloud, on local servers, or directly on devices. This gives organisations the freedom to place workloads where they make the most sense in terms of speed, privacy, and compliance.


Microsoft’s Phi-4 Models


“The energy intensity of advanced cloud and AI services has driven us to accelerate our efforts to drive efficiencies and energy reductions,” says Melanie Nakagawa, Microsoft’s Chief Sustainability Officer.


“As AI scenarios increase in complexity, we’re empowering developers to build and optimise AI models that can achieve similar outcomes while requiring fewer resources.”


Microsoft’s Phi-4 is the latest in its small language model portfolio. Available through Azure AI Foundry, HuggingFace, and the Nvidia API Catalogue, Phi-4 comes in two primary versions. Phi-4-multimodal is designed to handle speech, vision, and text, setting new benchmarks in automatic speech recognition and translation with a record word error rate of 6.14% on the HuggingFace OpenASR leaderboard.


“Phi-4-multimodal marks a new milestone in Microsoft’s AI development as our first multimodal language model,” says Weizhu Chen, Technical Fellow, CVP, Gen AI at Microsoft.


“By leveraging advanced cross-modal learning techniques, this model enables more natural and context-aware interactions, allowing devices to understand and reason across multiple input modalities simultaneously.


“Whether interpreting spoken language, analysing images, or processing textual information, it delivers highly efficient, low-latency inference – all while optimising for on-device execution and reduced computational overhead.”


Phi-4-mini, a 3.8 billion-parameter model, is built for rapid and accurate text-based tasks, including reasoning, mathematics, and code generation. It can process sequences of up to 128,000 tokens, making it well-suited for analysing long documents.


Both models combine high accuracy with low latency and reduced operating costs. They are especially effective in resource-constrained environments and can be run locally, which supports sustainability goals while enhancing privacy and security.


IBM’s Granite 3.2 Series


IBM is also pushing forward in this space with its Granite 3.2 model family. These models are optimised for business applications, offering strong language capabilities without the overhead of massive architectures.


The Granite 3.2 series features “chain of thought” reasoning for step-by-step problem-solving, which can be activated or deactivated depending on the task's complexity.


“The next era of AI is about efficiency, integration and real-world impact – where enterprises can achieve powerful outcomes without excessive spend on compute,” says Sriram Raghavan, Vice President of IBM AI Research.


“IBM's latest Granite developments focus on open solutions demonstrate another step forward in making AI more accessible, cost-effective and valuable for modern enterprises.”


One standout is the Granite Vision 3.2 2B model, a compact vision-language system created for enterprise document processing. Trained on over 85 million PDFs using IBM’s Docling toolkit, it can rival much larger models such as Meta’s Llama 3.2 11B when it comes to extracting, categorising, and reasoning over intricate documents.


IBM has also introduced the Granite Guardian 3.2 safety model, which is now 30 per cent smaller while retaining its strong performance, and the TinyTimeMixers forecaster for long-range predictions. These developments underline that sustainable, high-performance AI is both achievable and commercially viable.


Hybrid strategies, efficient architectures, and intelligent deployment will likely define the future of AI language processing. Companies may combine the broad capabilities of remote LLMs with the precision and efficiency of local or edge-based SLMs, balancing sustainability, privacy, and speed.


As Melanie says, “Sustainability is good business. Sustainable business practices drive innovation.”


While LLMs will continue to push boundaries in general reasoning, SLMs are positioned to drive widespread, practical transformation across industries. Their lower resource requirements and greater adaptability mean AI can be deployed in ways that support global climate goals while ensuring equitable access to intelligent technologies.

bottom of page