Anthropic released a new AI model called Claude Feibel 5 on Sept. 9, 2024 [1].

The launch marks the first time the U.S.-based company has made one of its high-performance models publicly available. This move allows a wider audience to access advanced AI capabilities while testing the company's ability to manage the risks associated with open access.

According to company data, Claude Feibel 5 provides performance equivalent to the Claude Mythos model [2], which remains unreleased. By bridging this gap, Anthropic aims to provide state-of-the-art utility to the general public without sacrificing the stability of its internal research pipeline.

To prevent the misuse of the technology, Anthropic implemented specific security safeguards [1]. The system is designed to identify queries related to cyber attacks, or the creation of biological and chemical weapons. When the model detects such prompts, it routes the query to a weaker model [1].

This routing mechanism ensures that the most capable version of the AI cannot be leveraged to generate dangerous instructions. By diverting risky requests to a less capable system, the company seeks to mitigate the potential for large-scale harm while maintaining the model's general utility for benign tasks [1].

Anthropic, headquartered in the San Francisco Bay Area, continues to focus on the balance between model transparency and safety [1]. The deployment of Claude Feibel 5 serves as a test case for how the company handles the tension between public availability and the prevention of catastrophic misuse [2].

Claude Feibel 5 provides performance equivalent to the Claude Mythos model

The release of Claude Feibel 5 demonstrates a shift in Anthropic's strategy toward public accessibility. By implementing a 'divert-to-weaker-model' security architecture, the company is attempting to solve the 'dual-use' dilemma—where a tool is useful for productivity but dangerous in the wrong hands—without resorting to a total blackout of the technology.