Anthropic Releases Claude Mythos AI With Cyber-Attack Safeguards

Anthropic has released its new AI model, Claude Mythos, to the public with built-in safety controls to prevent cyber-attacks ^[1].

The launch marks a significant effort to balance the deployment of powerful artificial intelligence with the need to prevent digital warfare. By implementing automatic refusal safeguards, the company aims to mitigate the risk of its technology being used to create malicious code or identify system vulnerabilities.

The model first underwent a trial release in early April 2026 ^[2]. Following that period, Anthropic announced a policy shift on May 18, 2026, confirming that the model is now publicly accessible ^[3]. These safety measures ensure that the AI automatically blocks any instructions that could be utilized for cyber-attack purposes ^[1].

Based in the U.S., Anthropic is offering the model globally, including to institutions in Japan ^[4]. The company's move comes amid growing concerns regarding the potential misuse of high-capacity AI models by bad actors. The implementation of these safeguards was a prerequisite for making the model generally available to the public ^[1].

While some reports suggested the general release was cancelled due to security concerns, the company confirmed the rollout proceeded with the integrated safety controls ^[3]. This strategic approach allows the startup to scale its technology while maintaining a defensive posture against emerging threats.

Anthropic continues to grow its presence in the competitive AI landscape. The company currently holds a market valuation of 60 trillion yen ^[5].

“Claude Mythos is now publicly available, equipped with automatic safeguards that refuse any request that could be used for cyber attacks.”

The release of Claude Mythos reflects a broader industry trend toward 'safety-first' deployment. As AI models become more capable of writing complex code, the risk of them being used to automate cyber-attacks increases. By baking refusal triggers directly into the model's policy, Anthropic is attempting to establish a standard where security is not an optional add-on but a core component of the AI's operational architecture.

Sources