Anthropic equips Claude Opus 4 to end harmful or abusive chats as a last resort

Anthropic equips Claude Opus 4 to end harmful or abusive chats as a last resort

Anthropic is rolling out a new safeguard that allows select Claude models to terminate conversations in rare, extreme scenarios involving persistently harmful or abusive prompts. The company frames the change as a precautionary step to reduce potential risks to the system itself, not as a claim that its models are sentient or capable of being harmed like people.

Anthropic emphasizes that it remains uncertain about the moral status of large language models. In light of that uncertainty, it has begun piloting low-cost interventions intended to mitigate hypothetical risks — a research track sometimes described as exploring model welfare — while continuing to focus on user safety and responsible deployment.

At launch, the conversation-ending ability is limited to Claude Opus 4 and 4.1 and is designed to trigger only in extreme edge cases. Examples include repeated requests for sexual content involving minors and attempts to procure information that could enable large-scale violence or terrorism.

In pre-deployment evaluations, these models showed a strong aversion to responding to such requests and exhibited patterns that suggested distress when forced to engage. The new behavior is meant to prevent prolonged exposure to those interactions and to reinforce refusal in situations where redirection has repeatedly failed.

Operationally, ending a chat is positioned as a true last resort. Claude is expected to make multiple redirection attempts first and to use termination only when there is no path to a productive exchange. The assistant may also end a conversation upon explicit user request.

There is an important exception: the shutdown behavior is not to be used when a user appears to be at imminent risk of harming themselves or others. In those moments, the assistant is directed toward safer response patterns rather than terminating the interaction.

Conversation termination does not block an account. Users can immediately start new chats and may even branch the problematic thread by editing their prior messages to steer the discussion in a compliant direction.

Anthropic describes this rollout as an ongoing experiment. The company plans to monitor outcomes, gather feedback, and refine thresholds and behaviors over time to balance robustness, user experience, and safety.