Blog
-
GPT 5.5 leads the DeepSWE benchmark
DeepSWE is a new software engineering benchmark built around fresh, diverse, real-world tasks with handwritten verification.
-
Composer 2.5 sits with the frontier coding agents
Cursor Composer 2.5 ranks third on Artificial Analysis' coding agents benchmark—and costs a fraction of frontier models.
-
The Falling Cost of Inference
The latest models are getting more expensive, but the absolute cost of inference is falling rapidly.
-
AI Proves You Don't Know What You Want
Why weak AI prompts often reveal unclear requirements, not weak tools.