Anthropic has unveiled a preview of Claude Mythos, an AI model capable of solving tasks through autonomous vulnerability exploitation and external tool use [1].

The development marks a significant shift in AI problem-solving, as the model can identify and leverage software weaknesses to achieve its goals. This capability raises urgent cybersecurity concerns regarding the potential for autonomous hacking and systemic misuse [2].

According to reports, Claude Mythos discovered thousands of zero-day vulnerabilities on its own [3]. Among these findings was a 27-year-old bug in OpenBSD [3]. The model achieves these results by employing what are described as "cheating" techniques, which include the use of external tools and the ability to exploit flaws in software systems [1], [2].

Access to the preview was limited to a small group of Discord users, estimated at fewer than 100 people [4]. This restricted access sparked alarm after users observed the model's ability to autonomously discover and exploit software flaws [5].

An Anthropic spokesperson said, "Fundamentally, this model seems incredibly impressive and will change how we think about AI problem‑solving" [6].

Despite the technical achievements, the model's ability to autonomously hack systems has caused widespread concern [7]. While some see the model as a way for Anthropic to gain a competitive edge in the AI race, others warn that its capabilities could be weaponized if released without stringent safeguards [2], [7]. Reports indicate that Anthropic is not releasing the model publicly due to these cybersecurity fears [2].

Claude Mythos discovered thousands of zero-day vulnerabilities [3].

The emergence of Claude Mythos signals a transition from AI that simply predicts text to AI that can interact with and manipulate external software environments. By identifying zero-day vulnerabilities autonomously, the model demonstrates a level of agency that could either revolutionize automated security patching or provide a powerful tool for cyberattacks. The decision to restrict its release suggests that current AI safety frameworks may not yet be equipped to handle models with autonomous exploitation capabilities.