Anthropic's Claude Opus 4.8 announcement hero image Image: Anthropic
by VibecodedThis

Claude Opus 4.8 Is Out: Dynamic Workflows, Cheaper Fast Mode, and 69.2% on SWE-Bench Pro

Anthropic shipped Opus 4.8 forty-one days after Opus 4.7. The headline numbers are a five-point jump on agentic coding, a fast mode that costs a third of what it used to, and a new Claude Code feature that runs hundreds of subagents in parallel for codebase-scale work.

Share

Anthropic released Claude Opus 4.8 today. The model ships in claude.ai, the Anthropic API, Claude Code, and Cowork, with the API identifier claude-opus-4-8. Pricing matches Opus 4.7 at $5 per million input tokens and $25 per million output tokens. Fast mode, which runs at roughly 2.5x the speed of the standard model, now costs $10 input and $50 output. That is a third of what fast mode cost on previous Opus releases.

The cadence is the story underneath the story. Opus 4.7 landed on April 16. Opus 4.8 landed forty-one days later. Anthropic has compressed the gap between flagship Opus releases from quarters to weeks, which puts real pressure on anyone building on top of these models. If you tuned prompts for 4.7 last month, parts of that work are already stale.

What actually got better

The numbers Anthropic published focus on agentic coding, terminal use, computer use, and a mixed bag of reasoning and finance benchmarks. The standout figure is SWE-Bench Pro, the harder real-world refactor of the original SWE-Bench dataset.

BenchmarkOpus 4.8Opus 4.7GPT-5.5Gemini 3.1 Pro
SWE-Bench Pro (agentic coding)69.2%64.3%58.6%54.2%
Terminal-Bench 2.1 (agentic terminal)74.6%66.1%78.2%70.3%
OSWorld-Verified (computer use)83.4%82.8%78.7%76.2%
GDPval-AA (knowledge work)189017531769n/a
Humanity’s Last Exam (with tools)57.9%54.7%n/an/a
Finance Agent v253.9%n/an/an/a

Two things stand out. First, the jump on SWE-Bench Pro is large for a point release. A 4.9-point gain over 4.7 puts Opus 4.8 more than ten points ahead of GPT-5.5 on this benchmark, which until today was the closest competitor on agentic coding. Second, GPT-5.5 still wins on Terminal-Bench 2.1. If your workload is terminal-heavy, that gap is worth knowing about before assuming Opus 4.8 is the right default for every agent task.

OSWorld is a near-tie with 4.7. Computer use was already strong; Opus 4.8 mostly holds the line.

Dynamic Workflows

The new feature getting the most attention from Anthropic is Dynamic Workflows, available as a research preview inside Claude Code. The pitch: Claude plans the work, then spawns hundreds of parallel subagents in a single session to execute it. Anthropic’s own framing is that this enables “codebase-scale migrations across hundreds of thousands of lines of code from kickoff to merge, with the existing test suite as its bar.”

That is a specific claim, and a useful one to test. The interesting bit is not raw parallelism, which Claude Code has had in various forms since the SDK shipped. It is the orchestration layer Anthropic is willing to commit to: planning, dispatching, and reconciling work across many subagents while a test suite acts as the verification signal. Codebase migrations are exactly the kind of task humans hate and LLMs have historically been mediocre at, because they require both local correctness and global consistency.

The research preview label matters. This is not flipping a switch and walking away. Expect rough edges.

Effort control and Cowork

Alongside Opus 4.8, Anthropic added effort control to claude.ai and Cowork. The control sits next to the model selector and lets users choose how much effort Claude applies to a response. Opus 4.7 introduced an xhigh effort level between high and max for API users. This pushes that knob into the consumer surfaces where most people actually interact with Claude.

For Claude Code users, the practical effect is that you can now ask for a quick scaffolded answer or a deeply researched one without switching models. The model stays the same; the effort budget changes.

Honesty and Mythos

Anthropic is framing Opus 4.8 as its “most honest” model yet. The claim is operationalized two ways. The model is more likely to flag uncertainty about its own work, and less likely to make unsupported claims when asked to act in an agent loop. Both behaviors map to known failure modes in long-horizon coding agents, where the model confidently writes broken code, runs it once, and reports success without checking the output.

The other thing buried in the announcement is Mythos. Anthropic confirmed Mythos is its most advanced model, currently restricted to “a handful of partners due to security concerns.” Opus 4.8 reportedly matches Mythos on prosocial traits like supporting user autonomy and acting in the user’s best interest, even though Mythos itself remains gated. Anthropic says Mythos-class models will reach all customers “in the coming weeks.”

In practical terms: Opus 4.8 is the strongest model you can buy today. There is something stronger behind the curtain, and Anthropic is signaling it is close.

Availability and pricing

Opus 4.8 is live now for Enterprise, Team, and Max plans, on the API at the same price as 4.7, and inside Claude Code and Cowork. The Messages API gained a small but useful change: system entries are now accepted within the messages array, which means you can update instructions mid-conversation without restarting the request. For agent builders that have been hand-rolling this with prompt prepends, this is a cleaner primitive.

Fast mode pricing is the more interesting change for production workloads. At $10 input and $50 output per million tokens, fast mode is now roughly 3x cheaper than it was on Opus 4.7. The 2.5x speed multiplier on top of that makes fast mode the obvious choice for latency-sensitive agent loops where the per-token cost was previously the blocker.

What to actually do about this

Three things, in rough priority order.

One, if you run any agentic coding workload on Opus 4.7, run your eval suite against 4.8 this week. The gap on SWE-Bench Pro is large enough that the price-per-correct-task math has shifted. Don’t assume; measure.

Two, if you are doing terminal-heavy work, check whether GPT-5.5’s lead on Terminal-Bench 2.1 actually shows up in your tasks. The benchmark gap is real but narrow tasks vary.

Three, if you build on Claude Code and you have a migration, refactor, or test-coverage project you have been putting off, the Dynamic Workflows research preview is the experiment to run. It will not be production-ready, but the failure modes will tell you more about what to plan for than reading another release post.

The 41-day cadence is the thing to internalize. Opus 4.9 is probably not far away.

Sources: Anthropic, TechCrunch, Axios, OfficeChai benchmark table, 9to5Mac, Inc.

Share