Technology | AI safety

Anthropic Apologizes, Reverses Hidden Guardrails on Claude Fable 5 AI Model

Anthropic has apologized for secretly throttling its new AI model, Claude Fable 5, with hidden guardrails that affected researchers and rivals. The company is reversing course, promising transparency when safety measures kick in.

Anthropic has apologized for secretly throttling its new AI model, Claude Fable 5, with hidden guardrails that affected researchers and rivals. The company is reversing course, promising transparency when safety measures kick in.

Anthropic has apologized for stealthily throttling its new AI model, Claude Fable 5, with hidden guardrails that undermined both researchers and rivals using it to develop competing systems. The company says it is reversing course and will be more transparent about when the restrictions kick in, even if that means Fable refuses more queries.

Key Points

  • Anthropic backpedals on hidden safety measures for Claude Fable 5 after backlash.
  • Distillation queries will now be routed to older Claude Opus 4.8 model with clear user notification.
  • Company admits invisible safeguards were wrong tradeoff, promises transparency.

Background on Fable's Safety Measures

Fable is the first widely available model in Anthropic’s Mythos class of AI systems, a group the company has spent months warning are too dangerous for public release. Anthropic says it has addressed some of those risks by launching Fable with safeguards that prevent it from responding to certain “high-risk” queries.

One of the areas Anthropic said it would restrict Fable’s responses is distillation, a technique for training smaller AI models using the outputs of larger ones. In Fable’s system card, Anthropic said it would handle queries it believed were distillation attempts by altering and degrading the model’s answers directly, without notifying users.

New Approach to Distillation

Anthropic said it is now changing its approach to distillation: Queries will now fall back to Claude Opus 4.8, Anthropic’s previous flagship model. Anthropic will prominently tell users: “You will see this every time it happens.” This is similar to how Fable handles queries in other high-risk areas like biology, chemistry, and cybersecurity, where queries are routed through Opus 4.8 unless blocked outright.

Backlash and Apology

The change follows intense backlash from the AI research community over Anthropic’s decision to silently limit users suspected of trying to distill Fable into competing models. In a statement, Anthropic acknowledged: “Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.”

Follow us on Google News

Explore more

Anthropic Partners with TCS to Boost Enterprise AI Adoption in India

Anthropic has partnered with Tata Consultancy Services to accelerate enterprise AI adoption. TCS will create a dedicated business unit, provide Claude AI…

More on Technology from Himachal Pradesh

Signal Alums Unveil Encrypted Spaces: A New Way to Build Private Collaboration Apps

A team of cryptographers, including former Signal developers, has released Encrypted Spaces, a set of open-source code libraries that enable developers to…

Boox Go 6 (Gen II) E-Reader Adds Note-Taking, Runs Android 11

Boox has unveiled the Go 6 (Gen II), a 6-inch e-reader that now supports note-taking with the InkSense Plus stylus. It runs…

Best Smart Chess Boards 2025: Top Picks for Online and Offline Play

After testing multiple smart chess boards, the Chessnut Pro stands out for its classic wooden design, weighted pieces, and seamless online connectivity.…