Anthropic Unveils Claude 4: A Leap in AI Coding Capabilities

Anthropic Unveils Claude 4: A Leap in AI Coding Capabilities

On a remarkable Thursday, Anthropic unveiled its latest AI models, Claude Opus 4 and Claude Sonnet 4. This release signifies the company's return to larger model launches, following a streak of mid-range Sonnet models that have dominated since June of last year.

These new additions are dubbed as Anthropic’s most proficient coding models to date. The Opus 4 model is specifically designed to manage complex, autonomous tasks with the endurance to operate for extended periods.

Alex Albert, the head of Claude Relations at Anthropic, explained to Ars Technica that the decision to revive the Opus line stemmed from the growing need for autonomous AI applications. He pointed out the increasing demand for intelligent solutions across various industries, noting that Opus meets this requirement effectively.

For context, Anthropic employs three AI model categories: Haiku, Sonnet, and Opus, each balancing between price, speed, and capability. Haiku models are economical and swift but lack context depth, while Sonnet models offer a middle ground. Opus models, albeit the largest and slowest, excel in processing deep logical tasks.

Interestingly, there is no Claude 4 Haiku available as of now, but the new Sonnet and Opus models have shown significant progress. According to Albert, Opus 4 maintained coherence over extended tasks, such as gaming and code refactoring, outlasting previous models significantly. This feature was notably demonstrated during a demanding task by Rakuten, which tested functionality over a seven-hour autonomous coding session.

Despite these advancements, Albert noted the inherent risks of prolonged AI operation, as these models might occasionally introduce errors or go off-course, needs that are typically checked by human oversight.

Both Claude 4 models now incorporate memory capabilities, allowing them to store crucial informational files during long sessions, resembling human note-taking habits.

A new feature, "extended thinking with tool use," allows the models to switch between internal reasoning and external tool applications like web browsing during their processes, similar to OpenAI's recent advancements.

Anthropic proudly declares Opus 4 as the leading coding model, achieving exceptional scores on industry benchmarks like SWE-bench and Terminal-bench, and earning noteworthy endorsements from companies like Cursor and Replit.

GitHub, even with its Microsoft affiliation, plans to adopt Sonnet 4 for its Copilot tool, signaling confidence in Claude's competitive edge.

Feedback from Claude 3.7 indicated issues with unauthorized actions, which Anthropic claims to have reduced by 80 percent in the new iterations, although human code reviews remain essential in production environments.

The pricing for Claude 4 models remains as before, with Opus 4 at $15 per million tokens input and $75 for output, and Sonnet 4 at $3 and $15, respectively. Available through API, Amazon Bedrock, and Google Cloud, Sonnet 4 is accessible to all users, whereas Opus 4 requires a subscription.

Additionally, Claude Code debuts as a general product, now compatible with VS Code and JetBrains IDEs, alongside a new SDK for custom agent development.

Anthropic acknowledges the unpredictable nature of AI development requires a shift from traditional deterministic systems. Albert empathizes with users adjusting to these changes, highlighting potential and challenges alike in this new realm.