Your AI vendor won't tell you they're out of compute.

Jonathan Razza
Apr 18
2 min read

Mid-March, Claude users started noticing something was off. Session limits tightened during peak hours. Outputs got shorter, less thorough. Intermittent outages with no explanation. No announcement from Anthropic.

The backstory: in late February, the Pentagon blacklisted Anthropic for refusing to let Claude be used for mass surveillance or autonomous weapons. OpenAI signed the contract Anthropic refused, hours later. The backlash was immediate — ChatGPT uninstalls spiked 295% in a day, Claude hit #1 in the US App Store, and Anthropic's ARR went from $9B to $30B in a single quarter. A user surge they were glad to have, and completely unprepared to serve.

A senior AMD engineer got tired of waiting for answers. She quantified the regression herself — 18,000 thinking blocks across nearly 7,000 Claude Code sessions — and published the data. Three days ago, Anthropic released Opus 4.7 and, for the first time, publicly acknowledged that Opus 4.6 had quietly degraded on harder tasks.

Opus 4.7 is a real upgrade — 12 of 14 benchmarks, same price, strong coding gains, and the 4.6-era issues largely cleared up when it shipped. The tradeoff is how you control it. Fixed-budget thinking is gone. You can no longer set a precise token budget. The replacement is an effort level (visible in Claude Code, available in the API), where you set the dial and Claude decides the actual spend. Less granular, and it lets Anthropic manage serving costs more efficiently across millions of requests.

OpenAI's GPT-4 went through the same cycle in late 2023 — users noticed first, researchers confirmed the benchmark regression, and OpenAI acknowledged it weeks later. Complaints about the GPT-5 series have continued into 2026. Whether Anthropic avoids that same long-term pattern remains to be seen. But 4.7 looks like a genuine course correction and a rational response to compute constraints. The overall quality gains seem to at least partially offset the tradeoff in granular control.

Most companies auditing AI vendors ask about uptime, security, and pricing. Almost no one tracks output quality over time, which means capacity pressure arrives silently.

Performance degradation, response time, quality drift, intermittent availability — real problems, no formal notice, sometimes no admission at all. Most procurement checklists won't catch any of this. But the real answer isn't a better checklist — it's a fallback. If you're running anything mission-critical on a hosted model, you need a baseline, a way to detect drift, and a tested alternative you can route to when the primary goes sideways.

Your AI vendor won't tell you they're out of compute.

Recent Posts

Comments