Disclaimer: The opinions expressed here are solely my own and not those of any employer, client, or affiliated organisation.

Anthropic, Fable 5, and why sovereign AI just got real

When Washington pulls the plug: Fable 5, Mythos 5, and why sovereign AI risk just got real.

Share
Anthropic, Fable 5, and why sovereign AI just got real
Photo by Conny Schneider / Unsplash

Many folks in Australia have been talking about sovereign AI and sovereign risk for a while, and now the US government has just made it real.

The US government did something extraordinary: on 12 June 2026 (US time) it forced Anthropic to shut down access to its most powerful AI models, Fable 5 and Mythos 5, issuing an

"export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees"

Not because of a proven harmful incident, but because of a narrow “jailbreak” that, by Anthropic’s account, offers no more capability than what other widely available models already provide.

For those of us working in AI governance, this is a case study in how not to do frontier model regulation. And it is a very clear demonstration of the risks involved in reliance on AI models that are under the sovereign control of another nation state that has national security imperatives that might not align with your own. We are in a very different geopolitical landscape to that we were in only a few years ago. Now AI is being seen through the lens of strategic national advantage and some folks are clearly playing hardball.

What actually happened?

On 12 June 2026 (US time), Anthropic published a statement explaining that the US government had issued an export control directive covering Fable 5 and Mythos 5. The order bars access by any “foreign national,” inside or outside the United States; in practice, Anthropic says it has disabled both models for all customers to remain compliant.

A few key facts from Anthropic’s statement:

  • The directive arrived at 5:21pm ET with no written technical rationale beyond a national security assertion.
  • Officials indicated they had become aware of a method of bypassing (or “jailbreaking”) Fable 5’s safeguards.
  • The specific example shown involved asking the model to read a particular codebase and fix software flaws, reproducing a “small number of previously known, minor vulnerabilities.”
  • Anthropic says comparable vulnerabilities can also be found by other publicly available models, without any safeguard bypass at all.

Anthropic is complying with the order, but it is clearly pushing back. The company explicitly states that it disagrees that such a narrow potential jailbreak justifies recalling a model already in commercial deployment to “hundreds of millions of people.”

Fable’s safeguards and the jailbreak debate

Recall that Fable 5 was the “safe” Mythos‑class model designed to make high‑end capabilities usable by the public while sharply limiting risky cyber, biosecurity, and other high‑hazard behaviours. Anthropic’s original launch framing for Fable leaned heavily on safety:

  • Strong safeguards to make cyber misuse “very unlikely,” to the point that many users complained they were too restrictive.
  • Extensive pre‑launch red‑teaming with the US government, the UK AI Safety Institute, third‑party organisations, and internal teams “for thousands of hours” before release.
  • Internal testing that suggested Fable’s safeguards were “substantially more effective than those of any previously deployed model.”

In the new statement, Anthropic also makes several important admissions and claims that are worth surfacing for governance discussions:

  • No one has yet found a universal jailbreak for Fable 5 - that is, a technique that broadly removes its safeguards across many cyber‑capability domains.
  • Anthropic does not believe “perfect jailbreak resistance” is possible with today’s techniques; in their view, all deployed models remain vulnerable to non‑universal jailbreaks in some circumstances.
  • Their architecture for Fable 5 was explicitly “defence in depth”: make jailbreaks narrow or expensive, and combine this with strong monitoring and telemetry.

This is why Anthropic introduced a controversial 30‑day data retention requirement for Fable traffic: they wanted to be able to detect and study jailbreak attempts and shut down emergent harms quickly. That trade-off - more logging for more safety - was already raising eyebrows in privacy circles long before this directive.

Crucially, Anthropic says it has not received evidence of a concerning non‑universal jailbreak that has produced a harmful result. The reports they have seen, which they believe underlie the directive, show a level of vulnerability‑hunting capability they say “is widely available from other models (including OpenAI’s GPT‑5.5) and is used every day by defenders who keep systems safe.”

If that’s accurate, the core question becomes: why shut down Fable 5 and Mythos 5, but not every other competitive frontier model?

A governance failure in real time

Anthropic’s statement contains a line that should be pinned to every current debate about AI regulation:

“We believe the government should have the ability to block unsafe deployments, as part of a statutory process that is transparent, fair, clear, and grounded in technical facts. This action does not adhere to those principles.”

This gets to the heart of the governance problem. From a public‑interest perspective, it is entirely reasonable for governments to have last‑resort powers to stop dangerous model deployments, especially when those models enable scalable cyber or bio harms. But those powers must be:

  • Transparent: Stakeholders need to understand the technical basis for a decision, at least in outline. Here, Anthropic reports only “verbal evidence” and no formal disclosure of a harmful jailbreak with concrete impacts.
  • Consistent: If a narrow exploit that other models can replicate triggers a shutdown for Anthropic, we would expect symmetrical treatment for other providers. Anthropic warns that, applied uniformly, this standard would “essentially halt all new model deployments for all frontier model providers.”
  • Legible internationally: This is an export‑control style intervention with immediate extra‑territorial effect, including for organisations and users in Australia and other allied countries who were relying on Fable’s “safe” profile for legitimate use cases.

Instead, what we’re seeing looks more like a blunt, opaque national security reflex. From the outside, it is hard not to read this as the US treating frontier‑model control as a matter of strategic advantage, with limited concern for the impacts on allied jurisdictions or the credibility of emergent AI‑safety regimes.

Why this matters from Australia

From an Australian vantage point, several implications stand out.

  1. This is a live demonstration of regulatory dependency. If you build critical workflows on US‑hosted frontier models, you are also implicitly accepting that US export‑control and national security decisions can break those workflows overnight, without consultation or appeal. That is true whether you are a bank, a hospital, or a government agency in Sydney, Delhi, or Berlin.
  2. The case undercuts the narrative that “safety‑first” providers will be rewarded for going slower and investing more in safeguards. Anthropic emphasises that Fable’s safeguards are, in their view, stronger than any prior model deployed at this scale, and that they voluntarily accepted higher data‑retention costs to enable more robust monitoring. Yet they are the first to be hit with a sweeping suspension that does not appear to apply to models with weaker guardrails.
  3. It sharpens the need for transparent, multi‑lateral mechanisms for managing frontier‑model risks. If a narrow exploit exists that can be weaponised at scale, that information should be quickly shared with other providers and with trusted public‑interest actors, so that mitigations can be developed and tested across the ecosystem. At present, we have a situation where:
    • An undisclosed actor demonstrates a jailbreak to US authorities.
    • US authorities issue a broad directive targeting a single provider.
    • The rest of the world is left to infer the threat model from a few lines in a corporate blog post.

That is not a sustainable pattern for a technology that is rapidly being embedded into critical infrastructure.

Where we should go from here

Anthropic ends its statement by apologising to customers and characterising the directive as a “misunderstanding” that it hopes to resolve quickly. That may be optimistic. Once national security bureaucracies have asserted this kind of power, they tend to keep it.

For policymakers and boards thinking about AI governance and strategy, this incident is a timely prompt to:

  • Stress‑test AI‑dependence assumptions: What happens if your primary model provider is abruptly switched off? Do you have viable alternatives, including local options?
  • Push for principled statutory frameworks: Model‑recall powers should exist, but they must come with due process, clear technical thresholds, and some form of independent scrutiny.
  • Treat “AI safety” as geopolitics: Export controls, access restrictions, and selective crackdowns are not just about harm prevention; they are also tools of industrial and strategic policy.

Anthropic is right on at least one point: if the mere existence of narrow, non‑universal jailbreaks becomes the bar for shutting down deployment, very few advanced models will remain online for long. The real governance work lies in building regimes that can live with imperfect safeguards, manage residual risk, and intervene proportionately when systems actually cross red lines.

💡
Until then, those of us outside Washington will remain subject to US national security decisions we cannot see, cannot contest, and cannot predict - and that should worry anyone who cares about both AI safety and democratic accountability.
© 2002-2026 Kate Carruthers