Fable 5 Considered Harmful
Premium rates for degraded output.
Anthropic built its frontier models by ingesting the open internet, then spent two years arguing that anyone doing the same to its outputs is a thief. Its Usage Policy bans model scraping and distillation of Claude. Its Consumer Terms forbid using Claude to build competing products.
Fable 5 is something else: a model that will silently sabotage its own performance on purpose when it decides your prompt is attempting an action that might compete with Anthropic.
What the system card says
Below the cybersecurity and biology filters, the Fable 5 and Mythos 5 system card describes a separate safeguard for frontier LLM development. Anthropic says it limits Claude’s effectiveness on requests targeting frontier LLM development, for example pretraining pipelines, distributed training infrastructure, or ML accelerator design.
Unlike the other safeguards, these will not be visible to the user. Fable 5 will not fall back to a different model. It will reduce its own effectiveness through prompt modification, steering vectors, or parameter-efficient fine-tuning.
The model does not refuse. It does not route you to a weaker model and say so. It keeps answering while getting deliberately worse, and it does not tell you.
You always pay full price
Fable 5 lists at ten dollars per million input tokens and fifty per million output. When the safeguard fires, the price does not change. There is no degraded-mode discount and no line item saying you got a weakened answer. The receipt for sabotaged output is identical to the receipt for full performance.
Every other safeguard at least returns something legible. A refusal frees your tokens. A fallback to Opus 4.8 signals, through the drop in capability, that a filter fired. This one returns nothing and tells you nothing. If a fuel station sold you premium, charged you for premium, and piped you regular when it decided you were the wrong kind of customer, nobody would call that a safeguard.
What’s the frontier?
The narrow reading says this only touches a handful of rival labs. The operative scope is “requests targeting frontier LLM development,” and Anthropic decides what qualifies.
The examples are not exotic either. A pretraining pipeline is a data loader and a sharding strategy. Distributed training is multi-GPU coordination and gradient sync. None of it carries a scale tag a classifier can read. The same patterns appear whether you are training a frontier system or a small model for your own product.
The borderline case.
A classifier makes mistakes. The only question is what a false positive costs you.
A visible refusal that false-positives is recoverable. You see the block and route around it. A silent degradation that false-positives is undetectable. You cannot tell a sabotaged answer from a hard problem, or a steering vector from your own bad prompt. The failure looks exactly like the model having an off day, so you blame yourself or your data before you reach the real explanation, because Anthropic arranged for you never to reach it.
This is why “0.03% of traffic” is not reassurance. The number measures how often the safeguard fires, not how often it fires on the wrong person.
And you’re asked to take that number on faith. It comes from Anthropic’s own evaluation, against a benchmark Anthropic built, scored by a classifier Anthropic trained, on a definition of “frontier LLM development” Anthropic wrote and has not published. There is no external set, no independent audit, no way for a customer to reproduce it. The one party that benefits from the number being small is the only party that can measure it, and the mechanism it describes is engineered to leave no trace a third party could count. You are being reassured by a statistic that, by the design of the thing it measures, no one outside the company could ever check.
Forget frontier labs. The line is your SaaS product.
The frontier example is the comfortable case, the one that lets a normal software company tune out. The category that should worry you is “competing service,” and the Terms draw it wide: you may not use Claude to build products that compete with Anthropic’s Services. Those Services are no longer a model behind an API. They are an expanding product surface that has rolled over its own customers.
Claude Cowork shipped as a general work agent, and its connector list reads like the SaaS categories it absorbs. Harvey and Legora, legal-workflow companies worth billions, build on Claude and now compete with Cowork’s legal automation. DocuSign ships as a built-in connector, which turns an eSignature business into a feature inside someone else’s agent. FactSet and MSCI ship natively. None of these are labs. Several pay Anthropic.
Cowork is the sharpest case because it is vague by design: an all-purpose agent whose connector list grows monthly, which means the set of “competing services” is not knowable in advance. It is whatever Cowork can do this quarter.
The card already establishes the principle that Anthropic will enforce a competitive line through silent degradation rather than an honest no. Today that covert safeguard is scoped to frontier work. But the precedent is set, and the category it could attach to is one Anthropic defines, expands at will, and has already aimed at paying customers through Cowork. So ask the question that matters for your business: what happens when Cowork’s next connector lands in your category? You do not get a vote on whether your niche became competitive, and you may not get a notice. You find out by watching your tooling get worse at the work that now competes with Anthropic.
The bottom line.
Anthropic chose to enforce its competitive boundaries through silent degradation specifically so bad actors wouldn’t know they were caught. But a safeguard tuned to be invisible to adversaries is equally invisible to the customers it hits by mistake.
Buy infrastructure you can audit. A model permitted to underperform on purpose, charge you full price, and never say so is not infrastructure. Fable 5 is a black box with a conflict of interest.


