The Falling Cost of Inference

Paul Charlesworth

The news is full of price hikes from the Frontier labs. It feels like AI costs are soaring. But that’s getting it completely backwards. Per unit of inference, costs are collapsing.

Here’s 2 examples:

Self hosted models already out-perform last year’s state of the art.

ChatGPT 4.1 was released in April 2025. It scored 26 on the Artificial Analysis Intelligence Index. Google Gemma 4 31B came out in April 2026 and scored 39.

ChatGPT 4.1 cost $278 in tokens to run through the index. Gemma 4 31B runs on a high spec Mac Mini.

Gemma 4 also beat ChatGPT 4.1 on:

That’s astonishing progress. And it doesn’t stop there.

Medium effort models already out perform last quarter’s high effort beasts.

Theo T3 pointed this out in a recent video. I’m just writing it down.

In February 2026 Anthropic released Opus 4.6. On “Non-reasoning, High effort”, Opus 4.6 (High) scored 46 on the Artificial Analysis Intelligence Index. Two months later, OpenAI released ChatGPT 5.5. ChatGPT 5.5 (Medium) and ChatGPT 5.5 (Low) scored 57 and 51 respectively.

They were cheaper too.

To run the index, Opus 4.6 (High) cost $1,746. ChatGPT 5.5 (Medium) cost $1,199, and ChatGPT 5.5 (Low) cost only $501.

Both ChatGPT 5.5 variants also out performed Opus 4.6 (High) on:

We were besotted with Opus 4.6 (High) when it was released. Only a couple of months later we’re getting better performance for less than half the price.

Imagine where we’ll be a year from now.

Performance versus cost for Opus 4.6 (High), ChatGPT 5.5 (Medium), and ChatGPT 5.5 (Low)A scatter plot comparing index score and cost to run the index. Opus 4.6 (High) scored 46 and cost $1,746. ChatGPT 5.5 (Medium) scored 57 and cost $1,199. ChatGPT 5.5 (Low) scored 51 and cost $501.4045505560$0$500$1k$1.5kCost to run the indexIndex scoreOpus 4.6 (High)$1,746 / 46ChatGPT 5.5 (Medium)$1,199 / 57ChatGPT 5.5 (Low)$501 / 51

Artificial Analysis Intelligence Index score compared with the cost to run the index.

You’ll notice that although both stories are positive, there’s an obvious issue. The cost of SOTA models 6X’d in about a year. $1,746 for Opus 4.6 (High) in 2026 vs $278 for ChatGPT 4.1 in 2025 doesn’t look like progress.

So what?

To see the progress and the real direction of travel, you have to look at the cost per unit of intelligence, not the cost of the latest star. Opus 4.6 (High) was $38 per index point. ChatGPT 5.5 (Low) is $10.

When you do that, you see:

I know this isn’t the whole story. Mac Mini’s aren’t cheap at all. And models running on consumer hardware are much slower than their cloud hosted competition. But improvement in all these respects is a matter of time. The trajectory is clear.

AI is getting better and cheaper. Fast.

Copied