Blog

GPT 5.5 leads the DeepSWE benchmark
31 May 2026
DeepSWE is a new software engineering benchmark built around fresh, diverse, real-world tasks with handwritten verification.
Composer 2.5 sits with the frontier coding agents
23 May 2026
Cursor Composer 2.5 ranks third on Artificial Analysis' coding agents benchmark—and costs a fraction of frontier models.
The Falling Cost of Inference
14 May 2026
The latest models are getting more expensive, but the absolute cost of inference is falling rapidly.
AI Proves You Don't Know What You Want
4 May 2026
Why weak AI prompts often reveal unclear requirements, not weak tools.

GPT 5.5 leads the DeepSWE benchmark