Responsible AI

You Don't Always Need the Biggest AI Model

Liam D.·June 2026·5 min read
Illustration of choosing the right AI model, weighing a lighter model against a heavier, more powerful one.

The gap between the lightest and heaviest AI models is not 15 percent, it is five to ten times the processing and energy for the same question. Reaching for the most powerful model every time is the most common and most expensive mistake people make. It is also a responsible AI decision hiding in plain sight.

Key takeaway

Matching the AI model to the job, rather than defaulting to the most powerful one, is one of the simplest ways an individual or business can cut both the cost and the energy footprint of everyday AI use.

You don't turn the oven to full to warm a croissant. You don't fill the kettle to the brim for a single cup of tea. Most of us match the effort to the job without thinking about it, because we can feel the waste.

AI is the one place that instinct seems to switch off. People reach for the biggest, most powerful model for everything, from drafting a two-line email to running a genuine piece of analysis, and rarely stop to ask whether the job actually needed it.

Here is the part worth knowing. The gap between the lightest and heaviest AI models is not small. With a car, going from eco to sport mode might cost you 15 to 20 percent more power. With AI, the top tier can cost five to ten times what the lightest one does. That is a lot more processing, and a lot more energy, for the same question.

The big three all work the same way

Anthropic's Claude, OpenAI's ChatGPT and Google's Gemini each offer roughly three power tiers: light, middle and heavy. The names are not exactly intuitive. The lightest are Haiku, Instant and Flash. The middle tier covers Sonnet, Thinking and the adjustable Medium and High settings. The heaviest are Opus, and Pro, and confusingly, also Pro.

The labels move around constantly. Anthropic has since added an even heavier model, Fable, and an effort-level setting that runs from low to max. Honestly, even we find that last part murky: it is not at all clear whether low effort on a heavy model uses more or less than high effort on a lighter one. But the underlying structure holds. Light, middle, heavy.

A plain-language guide to the tiers and modes in Claude, ChatGPT and Gemini, mapping each provider's light, middle and heavy models

What you're actually choosing when you pick an AI model: a plain-language guide to the tiers across Claude, ChatGPT and Gemini.

Why this is a responsible AI question, not just a cost one

Most write-ups on this frame it as saving money. We want to frame it differently. Every time you fire up the heaviest model for a question the lightest could have answered, you are using more compute, and more energy, than the task ever needed.

The companies do not publish proper numbers, which is a problem in itself. Imagine buying a car with no data on its fuel economy. Google has estimated a single text query at around 0.24 watt-hours, or about nine seconds of watching television, but that figure is only so useful, because what you ask, and which tier you ask it on, changes the demand dramatically. As a rough proxy we look at what these providers charge enterprises per million tokens, and the spread is enormous.

Picture it simply. Asking the heaviest tier a question with no adjustment is like switching on five light bulbs instead of one. Sometimes you genuinely need all five. Often you do not.

So which tier do you actually need?

The most useful way to think about it, borrowed from one of the models itself, is not simple versus complex. It is: how much would a wrong answer cost me? Lighter tiers are quicker and cheaper, but more likely to slip. So ask whether you need a precise, reliable answer, or a directional one you will build on yourself.

Here is how that plays out, with examples from personal life (planning a weekend in Lisbon) and work (reviewing a supplier contract).

Lightest tier, for fast factual recall. "What's the cheapest way to get from Lisbon airport into the centre?" Or, "What's the standard notice period wording in a UK services contract?" Pure retrieval, no judgement required. Using a heavier tier here is like ordering a taxi to take you to your own front door.

Middle tier, for tasks that need context and judgement. "I've got two days in Lisbon in September, staying in Alfama. Plan an itinerary with food, one good viewpoint and a day trip, avoiding the obvious tourist traps." Or, "Here's our supplier contract. Walk through the payment and termination clauses and flag anything unusual against a standard agreement." This tier handles the bulk of real professional work at meaningfully lower cost than the top one. Most providers set it as the default, and it is a sensible place to start.

Heaviest tier, for synthesis across a lot of material. "Here are three years of accounts, two supplier contracts and a board paper. Where are we exposed if costs rise 10 percent over the next 18 months, and what should we renegotiate first?" You are asking the model to hold large amounts of data at once, find the threads, and produce analysis where being wrong cascades into the next decision. That is when the heavy tier earns its keep.

There is one more layer worth knowing. Most brands offer an extended or adaptive thinking toggle that adds a deliberate reasoning pause before answering. It helps on genuinely hard problems, and it adds processing too. Use it when the problem earns it, not by default.

A quick word on local models

There is another lever here, though it deserves its own post. You can run smaller AI models locally, on your own machine, rather than sending every query off to a data centre. The trade-offs are real, capability, setup effort, and the energy your own hardware draws, and we will get into them properly another time. For now, just know that the cloud-versus-local choice is another dial you can turn, and for the right tasks it is a genuine option for keeping both your data and your footprint closer to home.

The footprint adds up

One caution before you swing hard the other way: lightest is not always greenest. If a light model gets it wrong and you re-run the prompt five times, you may have burned more energy than one clean run on the middle tier would have. The goal is not always lightest. It is matched to the task.

And whatever the model does, the result is still on you, especially at work. Getting a restaurant recommendation wrong is recoverable. Getting the numbers wrong in front of the board is not.

Yes, the enormous data centres being built dwarf any one person's tier choice. That is true. But the day-to-day use of AI is the sum of millions of small decisions made by individuals, small businesses and teams. Nobody's single decision to buy an EV moves the needle on its own. Billions of them do. This is no different.

If you want help thinking through how your team uses AI sensibly, with an eye on both the cost and the footprint, that is part of what we do at Futureformed. Have a look at our work, or drop us a line.

This piece was written by Liam D. at Futureformed. If it sparked a thought, we’d be happy to continue the conversation.

Get in touch

AI transparency: This article was written by Liam. The analysis, views, and conclusions are his own.