The Falling Cost of Inference
The news is full of price hikes from the Frontier labs. It feels like AI costs are soaring. But that’s getting it completely backwards. Per unit of inference, costs are collapsing.
Here’s 2 examples:
Self hosted models already out-perform last year’s state of the art.
ChatGPT 4.1 was released in April 2025. It scored 26 on the Artificial Analysis Intelligence Index. Google Gemma 4 31B came out in April 2026 and scored 39.
ChatGPT 4.1 cost $278 in tokens to run through the index. Gemma 4 31B runs on a high spec Mac Mini.
Gemma 4 also beat ChatGPT 4.1 on:
- Terminal-Bench Hard (!)
- AA-LCR
- Humanity’s Last Exam
- τ²-Bench Telecom
- GPQA Diamond
- SciCode
- IFBench
- CritPt (although ChatGPT 4.1 didn’t compete) and,
- MMMU-Pro
That’s astonishing progress. And it doesn’t stop there.
Medium effort models already out perform last quarter’s high effort beasts.
Theo T3 pointed this out in a recent video. I’m just writing it down.
In February 2026 Anthropic released Opus 4.6. On “Non-reasoning, High effort”, Opus 4.6 (High) scored 46 on the Artificial Analysis Intelligence Index. Two months later, OpenAI released ChatGPT 5.5. ChatGPT 5.5 (Medium) and ChatGPT 5.5 (Low) scored 57 and 51 respectively.
They were cheaper too.
To run the index, Opus 4.6 (High) cost $1,746. ChatGPT 5.5 (Medium) cost $1,199, and ChatGPT 5.5 (Low) cost only $501.
Both ChatGPT 5.5 variants also out performed Opus 4.6 (High) on:
- Terminal-Bench Hard
- AA-LCR
- AA-Omniscience Accuracy
- Humanity’s Last Exam
- GPQA Diamond
- SciCode
- IFBench
- CritPt and
- MMMU-Pro
We were besotted with Opus 4.6 (High) when it was released. Only a couple of months later we’re getting better performance for less than half the price.
Imagine where we’ll be a year from now.
Artificial Analysis Intelligence Index score compared with the cost to run the index.
You’ll notice that although both stories are positive, there’s an obvious issue. The cost of SOTA models 6X’d in about a year. $1,746 for Opus 4.6 (High) in 2026 vs $278 for ChatGPT 4.1 in 2025 doesn’t look like progress.
So what?
To see the progress and the real direction of travel, you have to look at the cost per unit of intelligence, not the cost of the latest star. Opus 4.6 (High) was $38 per index point. ChatGPT 5.5 (Low) is $10.
When you do that, you see:
- It takes a couple of months for the latest models to become essentially half price.
- It takes about a year for the latest models to become essentially free.
I know this isn’t the whole story. Mac Mini’s aren’t cheap at all. And models running on consumer hardware are much slower than their cloud hosted competition. But improvement in all these respects is a matter of time. The trajectory is clear.
AI is getting better and cheaper. Fast.