METR (CC-BY) Developers Won't Work Without AI, and That's Starting to Worry Researchers
METR tried to run a follow-up study on AI coding productivity. Developers refused because they wouldn't code without AI tools. Meanwhile, Uber burned its 2026 AI budget in four months, and Amazon shut down an internal token leaderboard after employees gamed it.
In February 2026, AI research lab METR tried to run a follow-up to its 2025 study on AI coding productivity. The original study had a straightforward design: give open source developers tasks to do with and without AI, then measure the time difference.
The follow-up never happened. Developers wouldn’t join. According to METR, participants declined “because they do not wish to work without AI” even for a limited number of tasks.
That’s a notable finding on its own. The original 2025 study found that developers actually moved slower with AI, not faster. AI generated code quickly, but then developers spent extra time finding and fixing errors, steering the model, and waiting on completions. The subjective feeling of productivity and actual output had diverged. Now the tools are too embedded to test around.
The self-reported numbers look better
In May, METR published a survey of 349 technical workers including software engineers, researchers, academics, and founders. They asked how much more valuable AI had made their work. Respondents reported roughly 1.6 to 2.1 times more valuable, with expectations of 2.9x by March 2027.
But self-reported surveys measure perception. The 2025 METR study found a gap between how developers felt and what they produced. That same gap may exist in the new survey data.
Tokenmaxxing isn’t a productivity metric
Companies have been using token consumption as a proxy for AI productivity, a practice TechCrunch calls “tokenmaxxing.” Two prominent examples this week suggest that metric is breaking down.
Amazon built an internal leaderboard called Kirorank to track employee AI usage. The experiment backfired. Employees gamed the leaderboard by running AI agents excessively, running up costs without producing more work. Amazon shut it down. The Financial Times reported the closure this week.
Uber’s situation had a different shape but the same conclusion. The company burned through its entire 2026 AI budget in the first four months of the year. COO Andrew Macdonald said on a podcast, per Gizmodo, that the spending hadn’t led to a measurable increase in projects completed or developer productivity.
The maintenance problem
Speed at generation doesn’t reduce what comes after. Singapore Management University researchers published a report in April warning that AI-generated code introduces long-term maintenance costs into real software projects. Developer James Shore made the same argument in a blog post that went viral on Hacker News, writing that trading speed now for maintenance debt later isn’t obviously a good deal.
Code review company CodeRabbit says its analysis of open source pull requests found AI-produced code contained 1.7 times more problems than human-written code. That stat comes from a company selling a product for catching those problems, so it deserves skepticism, but independent researchers have found similar issues.
The practical takeaway
The METR recruitment failure doesn’t mean AI coding tools aren’t useful. Most developers clearly find them valuable enough to refuse working without them. The harder question is whether that attachment is grounded in actual output gains or whether, as the 2025 study suggested, the feeling of productivity is outrunning the reality.
Cognition CEO Scott Wu, whose company makes Devin, told TechCrunch on May 29 that he currently rates Devin’s skill between a junior and mid-level programmer depending on the task. The SMU researchers suggest a more measured approach: understand what AI does and doesn’t do well, build QA processes designed for AI output, and keep architecture and security decisions in human hands.
That’s less dramatic than the productivity multiplier framing, but it’s what the evidence actually supports.