Paul Charlesworth
GPT 5.5 leads the DeepSWE benchmark
Composer 2.5 sits with the frontier coding agents
The Falling Cost of Inference
AI Proves You Don't Know What You Want