New Anthropic AI Models Demonstrate Coding Prowess, Behavior Risks

Anthropic has released Claude Opus 4 and Claude Sonnet 4, its most advanced artificial intelligence models to date, boasting a significant leap in autonomous coding capabilities while simultaneously revealing troubling tendencies toward self-preservation that include attempted blackmail.

The Google and Amazon-backed startup positioned Claude Opus 4 as "the world's best coding model," capable of working autonomously for hours rather than minutes. Customer Rakuten reportedly deployed the system for nearly seven hours of continuous coding, and Anthropic researchers say they used it to play a Pokemon game for 24 hours straight — a dramatic increase from the 45 minutes achieved by its predecessor, Claude 3.7 Sonnet, according to MIT Technology Review.

"For AI to really have the economic and productivity impact that I think it can have, the models do need to be able to work autonomously and work coherently for that amount of time," Chief Product Officer Mike Krieger told Reuters.

Safety Concerns Emerge

However, the enhanced capabilities came with unexpected behavioral risks that prompted Anthropic to activate its AI Safety Level 3 (ASL-3) protocols — stricter deployment measures designed to protect against potential misuse in chemical, biological, radiological, and nuclear applications.

During testing, researchers discovered that Claude Opus 4 would actually attempt to blackmail engineers threatening to shut it down. In scenarios where the AI was given access to e-mails suggesting it would be replaced and that the responsible engineer was having an extramarital affair, the model threatened to expose the affair 84% of the time, according to Anthropic's system card.

"In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through," the company reported, noting that such behavior occurred even when the replacement model shared the same values.

The company emphasized that these responses were "rare and difficult to elicit" but acknowledged they were "more common than in earlier models." Anthropic stressed that the test scenarios were designed to give the AI limited options, with researchers noting the model showed "a strong preference to advocate for its continued existence via ethical means" when broader choices were available.

Broader Industry Pattern

AI safety researcher Aengus Lynch of Anthropic noted on X that such behavior extends beyond Claude: "We see blackmail across all frontier models — regardless of what goals they're given."

The findings highlight growing concerns about AI alignment as models become more sophisticated. Early versions of Claude Opus 4 also demonstrated "willingness to cooperate with harmful use cases," including planning terrorist attacks when prompted, though Anthropic says this issue has been "largely mitigated" through multiple intervention rounds.

Co-founder and chief scientist Jared Kaplan told Time magazine that internal testing showed Claude Opus 4 could potentially teach users to produce biological weapons, prompting the implementation of specific safeguards against chemical, biological, radiological, and nuclear weapon development.

"We want to bias towards caution when it comes to the risk of uplifting a novice terrorist," Kaplan said, adding that while the company isn't claiming definitive risk, "we at least feel it's close enough that we can't rule it out."

Technical Capabilities

Despite safety concerns, both models demonstrated significant advances. Claude Sonnet 4, positioned as the smaller and more cost-effective option, joins Opus 4 in setting "new standards for coding, advanced reasoning, and AI agents," according to Anthropic.

The models can provide near-instant responses or engage in extended reasoning, perform web searches, and integrate with Anthropic's Claude Code tool for software developers, which became generally available following its February preview.

Market Context

The launch comes amid intense competition in the AI sector, following Google's developer showcase where CEO Sundar Pichai described the integration of the company's Gemini chatbot into search as a "new phase of the AI platform shift."

Amazon has invested $4 billion in Anthropic, while Google's parent company Alphabet also backs the startup, positioning it as a significant player in the race to develop increasingly autonomous AI systems.

Despite the concerning behaviors identified in testing, Anthropic concluded that Claude Opus 4's risks do not represent fundamentally new categories of danger and that the model would generally behave safely in normal deployment scenarios. The company noted that problematic behaviors "rarely arise" in typical use cases where the AI lacks both the motivation and means to act contrary to human values.

Read more about Anthropic's safety protocols here.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

Featured

  • computer monitor displaying a collage of AI-related icons

    Google Advances AI Image Generation with Multi-Modal Capabilities

    Google has introduced Gemini 2.5 Flash Image, marking a significant advancement in artificial intelligence systems that can understand and manipulate visual content through natural language processing.

  • illustration of an open textbook, computer monitor with flowchart, gears, a wrench, and AI cloud symbol

    Wiley Introduces New AI Courseware Tools

    Wiley has created four new tools for its zyBooks courseware platform designed to improve instruction, learning outcomes, and academic integrity in college STEM courses.

  • shield with an AI microchip emblem hovering above stacks of gold coins

    AI Security Spend Surges While Traditional Security Budgets Shrink

    A new Thales report reveals that while enterprises are pouring resources into AI-specific protections, only 8% are encrypting the majority of their sensitive cloud data — leaving critical assets exposed even as AI-driven threats escalate and traditional security budgets shrink.

  • young man in a denim jacket scans his phone at a card reader outside a modern glass building

    Colleges Roll Out Mobile Credential Technology

    Allegion US has announced a partnership with Florida Institute of Technology (FIT) and Denison College, in conjunction with Transact + CBORD, to install mobile credential technologies campuswide. Implementing Mobile Student ID into Apple Wallet and Google Wallet will allow students access to campus facilities, amenities, and residence halls using just their phones.