Anthropic's Claude Fable 5: Cybersecurity Focus

Anthropic Launches Claude Fable 5

On June 9, Anthropic introduced Claude Fable 5, its most advanced AI model to date, making it generally accessible. This release marked a unique approach by splitting the model into two separate products based not on capabilities but on cybersecurity measures.

While Claude Fable 5 is available for public use, its counterpart, Claude Mythos 5, which shares the same foundational model but without the cyber restrictions, remains limited to a select group of vetted cybersecurity experts and critical infrastructure operators.

Understanding the Dual-Product Strategy

Anthropic describes Claude Mythos 5 as the most robust cybersecurity model globally. The primary distinction lies in how each model manages flagged requests related to cyber, biology, chemistry, and distillation. Claude Fable 5 redirects such requests to the less capable Claude Opus 4.8, whereas Mythos 5 retains these capabilities for its vetted users.

Both models are priced at $10 per million input tokens and $50 per million output tokens, a significant reduction from previous versions. Additionally, Claude Fable 5 is currently available at no extra cost on select Anthropic plans until June 22, after which it will require usage credits.

The Role of Cyber Classifiers

The separation exists because Mythos-class models have the ability to identify and exploit software vulnerabilities effectively. Anthropic aims to prevent these capabilities from falling into the wrong hands by employing a system of classifiers, which are AI systems designed to detect misuse and potential jailbreaks.

When a request is flagged, Claude Fable 5 doesn’t reject it outright but instead routes it to Opus 4.8, informing the user of this action. This classifier approach encompasses various tasks, including reconnaissance and attack progression, which are essential components of cybersecurity.

Potential Threats and Defensive Measures

Anthropic’s decision to handle these models with care stems from testing earlier this year, which revealed the potential for these AI systems to autonomously discover and exploit vulnerabilities in major operating systems and web browsers.

The Mythos Preview model demonstrated the ability to exploit zero-day vulnerabilities and older bugs, showcasing the need for stringent safeguards. Anthropic acknowledges that while complete prevention of universal jailbreaks is unlikely, the goal is to make exploits costly and detectable before they can be executed on a large scale.

In response, Anthropic has implemented a 30-day data retention policy for all traffic through Fable 5 and Mythos 5, aiming to identify novel attacks and ensure the security of its models. This policy may impact organizations with strict data-handling requirements.

As Anthropic plans to expand access to Mythos 5 through a trusted-access program, the company emphasizes the importance of industry-wide cooperation in adapting to similarly capable models from other labs, which may not include comparable security measures.

Understanding the Dual-Product Strategy

The Role of Cyber Classifiers

Potential Threats and Defensive Measures

Related Posts