Buried the Lede: How Anthropic Drowned Its Own Safety Plan
Anthropic published a blueprint for how governments should switch off dangerous AI models the same morning it shipped one. Three days later the government switched off Anthropic's.
A follow-up to Commerce Found the Kill Switch.
On the morning Anthropic launched Claude Fable 5, it also published a nineteen-page policy proposal called the Advanced AI Framework. The model got the launch-week oxygen: benchmarks, demos, and a system card long enough to become its own controversy. The framework was the more important document. It got buried.
That matters because the framework's central argument was that for a narrow set of dangerous AI systems, government should have legal authority to block deployment. Not through disclosure requirements or safety reports, but through an actual enforcement mechanism with specific triggers, third-party evaluations, and judicial review.
Three days later, the government blocked Fable 5. The mechanism it used had none of those safeguards.
The Commerce Department sent Anthropic an export-control directive on Friday evening, cutting off Fable 5 and its more capable sibling Mythos 5 for any foreign national, inside the country or out. Anthropic couldn't sort its users by nationality fast enough, so it switched both models off for everyone. We covered the mechanics of that shutdown on Saturday. The blueprint Anthropic published on Tuesday spelled out, in some detail, how a government should pull a model like this. Friday's shutdown followed almost none of it.
The signal they buried
Saturday's piece left a question hanging. If there is no clean line where a model becomes too dangerous to ship, who gets to draw it, and how? The Advanced AI Framework is Anthropic's attempt to answer, and the answer is more careful than the headlines suggested.
It proposes a standing agency rather than a fixed rule, with coverage thresholds reviewed every year and mandatory outside evaluation of the most capable models. The framework even says the compute threshold it starts with should give way to a capability-based one as training gets cheaper, and that the list of covered risks "may need to be adjusted over time." A line that moves as the technology moves.
The framework's blocking power arrives wrapped in constraints. An agency could act only on a short list of specific violations, and "may not pursue remedies based on its own assessment of risk." It would generally have to go to court rather than pull a model on its own authority. Developers would get an expedited path to challenge the decision. Equal-capability models would be held to one standard, with no company singled out on grounds unrelated to its safety record. Anthropic was asking to be regulated harder than current law allows, and asking for the brakes to be written in alongside the throttle.
Almost nobody read that part.
In about forty-eight hours, Anthropic shipped five large pieces of news: a frontier model, this safety framework, a companion economic framework, a long essay from its CEO, and a $150 million national fellowship program. The press took two things from the pile: the launch hype, and the system-card scandal.
The scandal came from a single line buried on page thirteen of a 319-page technical report, where Anthropic admitted Fable would quietly degrade its own answers on certain AI-research questions in a way that was "not visible to the user." That line became a trust problem, and Anthropic walked the behavior back inside a day.
So the framework's headline, that Anthropic wants the government to block dangerous AI, traveled everywhere. Its actual contents reached almost no one. The same week the internet was furious at Anthropic for hiding a setting in a document nobody reads, Anthropic published its most important policy proposal in a document nobody read.
This is the familiar failure mode of the overloaded meeting invite. You schedule a conversation, realize people need context, write the context, link four other documents, and then arrive the next morning surprised that nobody read the packet. The problem is not that the key document was unavailable. It is that availability got mistaken for communication.
For a serious policy proposal, the rollout could hardly have been worse. The proposal that explained how to handle a model like Fable responsibly went out the same morning as Fable, the launch hype, and a self-inflicted scandal, and drowned in all three. By the time the government reached for a response, it was responding to the model, not to the rulebook that shipped beside it.
The kill switch skipped the proposed framework
The framework describes a specific sequence for how Anthropic believes blocking a model release should happen. An agency acts on documented findings: a missing safety report, an unqualified evaluator, an evaluation that turned up catastrophic risk. It goes to court rather than pulling a model on its own authority. The developer gets an expedited chance to challenge the decision. And models with equivalent capabilities get held to one standard, so a shutdown targets the risk, not the company.
Friday did not follow that sequence. What Commerce had, by Anthropic's account, was a verbal description of a narrow technique for getting Fable to flag bugs in a codebase. No written finding, no named vulnerability. Anthropic says the same work is well within reach of other models already on the market, including OpenAI's GPT-5.5. The shutdown came by letter, with a deadline measured in minutes, and no court anywhere near it.
Anthropic now sits in a strange double position. The shutdown is the strongest argument the framework could have asked for, because a government reached for exactly the kind of power the framework said governments need. It is also the sharpest warning, because it arrived stripped of every constraint the framework spent three pages insisting on.
A power you can use inside ninety minutes on a Friday afternoon is a power that resists its own brakes. The framework lists the brakes. It does not solve how a switch that fast stays a switch anyone can check.
The less flattering reading
The cynical read starts with the moat. The framework only bites the largest developers: models above a high compute bar, companies above $500 million in AI revenue or a billion in research spending. That is a small club, and Anthropic is in it. A rule written by one of the few companies it would cover, aimed squarely at the frontier, raises the cost of catching up. The blocking power looks reasonable when you trust the people holding it, and the framework quietly casts Anthropic as the responsible adult in the room. The scope lines do track Anthropic's competitive interest.
There were smaller practical upsides too, without needing to guess anyone's intent. Fable was the expensive new model, priced at $10 per million input tokens and $50 per million output tokens on the API and temporarily included for paid subscribers while users tested how hard they could push it. Developer forums were already full of people trying to squeeze the most out of the short window before the model moved to usage credits. The ban cut off that subsidy problem, turned a messy launch into a national story, and gave Anthropic's policy proposal a better chance of being read than the launch did.
The moat critique has teeth. So does the convenience critique. But the part of the document that survived Friday is narrower and more practical. Anthropic had written down a version of the kill switch that required a standard, evidence, and a chance to challenge the decision. No other frontier lab has put that kind of process on the table. Then the government reached for the power anyway, using the tool it already had. That leaves the framework less as a defense of Anthropic than as a record of the machinery that was missing.
The proposed safeguards were in the file
A legitimate shutdown would need a published standard for what crosses the line, evidence stronger than a verbal description, and a path for the company to be heard before the model goes dark, or shortly after. Anthropic wrote those requirements down. They were sitting in the enforcement section of a document it released the same morning it shipped the model, in a week so loud that almost no one opened the file.
Friday the switch got pulled without them.
The framework was too new to matter in the moment. Three days was never enough time for anyone to build the process it described, so the government moved through export controls instead. Anthropic spent Tuesday describing the version with standards, evidence, and judicial review. By Friday evening, it was watching the other version happen to its own model.
The shutdown should send people back to the framework, not because it vindicates Anthropic, but because it shows what happens without one. The power to switch off a frontier model is already here. The missing part is the process around it.