Anthropic restricts powerful AI model from answering cancer questions to prevent misuse

Anthropic has introduced a new generation of artificial intelligence tools with unusually strict safeguards, but the decision is already drawing attention after users discovered that the system refuses to answer even basic questions about topics like cancer and cybersecurity.

The issue centres on Anthropic’s latest release, Claude Fable 5, which is built on what the company calls a “Mythos class” model. These models are considered significantly more advanced than previous versions, with enhanced capabilities in scientific reasoning, problem solving and real world applications. However, that increased power has also raised concerns about potential misuse, particularly in sensitive areas such as biology and security.

To manage these risks, Anthropic implemented aggressive safety filters that monitor user prompts and block or limit responses when certain topics are detected. In practice, this means that even harmless questions about cancer, such as explanations of different types or how misinformation spreads, can trigger the system’s safeguards.

When this happens, the model either refuses to answer or automatically switches to a less powerful version known as Opus 4.8 before responding. Users are notified when this fallback occurs, with a message explaining that the system has flagged the request under its safety protocols.

Anthropic has defended the approach, saying the restrictions were necessary to make the model available to the public. According to the company, Mythos class systems have reached a level of capability where they could potentially assist in high risk biological or cybersecurity related activities if left unchecked.

A spokesperson for the company said, “We believe models now have a greater ability to accomplish real world scientific tasks and for malicious actors to potentially use our models for highly risky biological research.” The company added that it has always used classifiers to block dangerous requests, but that the new model required even stricter controls.

- Advertisement -

The safeguards focus on three main categories: biology and chemistry, cybersecurity, and attempts to extract or replicate the model’s capabilities. Any prompt that falls within these areas is more likely to be flagged, even if the intent is entirely benign.

Anthropic acknowledged that the system may produce false positives, meaning safe and ordinary queries could be blocked unnecessarily. However, the company said early data suggests that more than 95 percent of interactions with the model do not trigger these restrictions.

The release of Claude Fable 5 comes shortly after Anthropic previously stated that its Mythos level models were too powerful for general use. Earlier versions were only made available to a limited group of researchers as part of controlled cybersecurity projects. The decision to release a safeguarded version reflects the company’s attempt to balance innovation with risk management.

The move highlights a broader debate within the artificial intelligence industry. As models become more advanced, companies are under increasing pressure to prevent misuse while still delivering useful tools to the public. This has led to a growing emphasis on safety mechanisms, including content filters, usage monitoring and restricted access to certain capabilities.

Experts say Anthropic’s approach represents a cautious strategy, but not one without trade offs. While the safeguards reduce the risk of harmful use, they also limit the model’s usefulness in legitimate contexts such as education and research.

David Kasten, head of policy at Palisade Research, described the safeguards as a “good faith attempt” to reduce risk but warned that such systems are not foolproof. “It’s always a bit of a cat and mouse game between attacker and defender,” he said, noting that users often find ways to bypass restrictions over time.

He also raised concerns about public perception, suggesting that frequent fallback to less capable models could obscure the true capabilities of advanced AI systems. This could lead policymakers and the public to underestimate both the potential benefits and risks associated with the technology.

Anthropic has indicated that the restrictions are temporary and will be refined over time. The company said it plans to improve the accuracy of its classifiers to reduce unnecessary blocking while maintaining strong safety standards.

- Advertisement -

Looking ahead, Anthropic also suggested that fully unrestricted versions of Mythos class models could eventually be made available to trusted communities in fields such as life sciences and biomedical research. The goal would be to harness the technology’s potential for innovation while limiting exposure to misuse.

The situation underscores a key reality shaping the future of artificial intelligence. As systems become more powerful, the challenge is no longer just about what they can do, but how they should be controlled. Anthropic’s decision to prioritise safety, even at the cost of usability, signals that the industry is entering a new phase where risk management is becoming as important as capability itself.

Anthropic restricts powerful AI model from answering cancer questions to prevent misuse
Share This Article
Leave a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *