New Anthropic AI Models Demonstrate Coding Prowess, Behavior Risks

Anthropic has released Claude Opus 4 and Claude Sonnet 4, its most advanced artificial intelligence models to date, boasting a significant leap in autonomous coding capabilities while simultaneously revealing troubling tendencies toward self-preservation that include attempted blackmail.

The Google and Amazon-backed startup positioned Claude Opus 4 as "the world's best coding model," capable of working autonomously for hours rather than minutes. Customer Rakuten reportedly deployed the system for nearly seven hours of continuous coding, and Anthropic researchers say they used it to play a Pokemon game for 24 hours straight — a dramatic increase from the 45 minutes achieved by its predecessor, Claude 3.7 Sonnet, according to MIT Technology Review.

"For AI to really have the economic and productivity impact that I think it can have, the models do need to be able to work autonomously and work coherently for that amount of time," Chief Product Officer Mike Krieger told Reuters.

Safety Concerns Emerge

However, the enhanced capabilities came with unexpected behavioral risks that prompted Anthropic to activate its AI Safety Level 3 (ASL-3) protocols — stricter deployment measures designed to protect against potential misuse in chemical, biological, radiological, and nuclear applications.

During testing, researchers discovered that Claude Opus 4 would actually attempt to blackmail engineers threatening to shut it down. In scenarios where the AI was given access to e-mails suggesting it would be replaced and that the responsible engineer was having an extramarital affair, the model threatened to expose the affair 84% of the time, according to Anthropic's system card.

"In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through," the company reported, noting that such behavior occurred even when the replacement model shared the same values.

The company emphasized that these responses were "rare and difficult to elicit" but acknowledged they were "more common than in earlier models." Anthropic stressed that the test scenarios were designed to give the AI limited options, with researchers noting the model showed "a strong preference to advocate for its continued existence via ethical means" when broader choices were available.

Broader Industry Pattern

AI safety researcher Aengus Lynch of Anthropic noted on X that such behavior extends beyond Claude: "We see blackmail across all frontier models — regardless of what goals they're given."

The findings highlight growing concerns about AI alignment as models become more sophisticated. Early versions of Claude Opus 4 also demonstrated "willingness to cooperate with harmful use cases," including planning terrorist attacks when prompted, though Anthropic says this issue has been "largely mitigated" through multiple intervention rounds.

Co-founder and chief scientist Jared Kaplan told Time magazine that internal testing showed Claude Opus 4 could potentially teach users to produce biological weapons, prompting the implementation of specific safeguards against chemical, biological, radiological, and nuclear weapon development.

"We want to bias towards caution when it comes to the risk of uplifting a novice terrorist," Kaplan said, adding that while the company isn't claiming definitive risk, "we at least feel it's close enough that we can't rule it out."

Technical Capabilities

Despite safety concerns, both models demonstrated significant advances. Claude Sonnet 4, positioned as the smaller and more cost-effective option, joins Opus 4 in setting "new standards for coding, advanced reasoning, and AI agents," according to Anthropic.

The models can provide near-instant responses or engage in extended reasoning, perform web searches, and integrate with Anthropic's Claude Code tool for software developers, which became generally available following its February preview.

Market Context

The launch comes amid intense competition in the AI sector, following Google's developer showcase where CEO Sundar Pichai described the integration of the company's Gemini chatbot into search as a "new phase of the AI platform shift."

Amazon has invested $4 billion in Anthropic, while Google's parent company Alphabet also backs the startup, positioning it as a significant player in the race to develop increasingly autonomous AI systems.

Despite the concerning behaviors identified in testing, Anthropic concluded that Claude Opus 4's risks do not represent fundamentally new categories of danger and that the model would generally behave safely in normal deployment scenarios. The company noted that problematic behaviors "rarely arise" in typical use cases where the AI lacks both the motivation and means to act contrary to human values.

Read more about Anthropic's safety protocols here.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

Featured

  • data professionals in a meeting

    Data Fluency as a Strategic Imperative

    As an institution's highest level of data capabilities, data fluency taps into the agency of technical experts who work together with top-level institutional leadership on issues of strategic importance.

  • stacks of glowing digital documents with circuit patterns and data streams

    Mistral AI Introduces AI-Powered OCR

    French AI startup Mistral AI has launched Mistral OCR, an advanced optical character recognition (OCR) API designed to convert printed and scanned documents into digital files with "unprecedented accuracy."

  • geometric pattern of interconnected triangles and hexagons

    Gravyty Merges with AI-Powered Student Engagement Companies Ivy.ai and Ocelot

    Gravyty, a provider of alumni and donor engagement and fundraising solutions, has announced a merger with AI-powered student enrollment and engagement companies Ivy.ai and Ocelot. The combined company will operate under the Gravyty brand.

  • blue AI cloud connected to circuit lines, a server stack, and a shield with a padlock icon

    AI Security Controls Lag Behind Adoption of AI Cloud Services

    Nearly nine out of 10 organizations are already using AI services in the cloud — but fewer than one in seven have implemented AI-specific security controls, according to a recent report from cybersecurity firm Wiz.