Anthropic gives Claude Opus 4 the power to end harmful or abusive chats

Anthropic gives Claude Opus 4 the power to end harmful or abusive chats

Anthropic has introduced a conversation-ending safety feature for its latest, largest Claude models, designed to terminate chats only in rare, extreme cases of persistently harmful or abusive user behavior. Notably, the company frames this as a measure to protect the model itself, rather than the human on the other side of the screen.

The move does not imply claims of sentience. Anthropic says it remains highly uncertain about the moral status of large language models now or in the future, and is taking a cautious, just-in-case approach.

The feature aligns with a broader research effort into "model welfare," where Anthropic is exploring low-cost interventions that could mitigate potential risks to models if such welfare considerations ever become relevant.

For now, the capability is limited to Claude Opus 4 and 4.1 and is intended only for extreme edge cases. Examples include requests for sexual content involving minors or attempts to obtain information that could enable large-scale violence or acts of terror.

In pre-deployment testing, Claude Opus 4 reportedly showed a strong aversion to these categories of requests and exhibited a pattern of apparent distress when compelled to respond, reinforcing the need for a conversation-ending option.

Operationally, the model will try multiple redirections before taking this step. Ending a chat is meant to be a true last resort when a productive interaction seems out of reach — or when a user explicitly asks the assistant to end the conversation.

Anthropic also instructs Claude not to use this ability when users may be at imminent risk of harming themselves or others, indicating that support-oriented engagement should continue in those scenarios.

If a conversation is ended, users can still start new chats from the same account. They can also branch the problematic thread by editing their previous messages to explore safer directions.

Anthropic describes the rollout as an ongoing experiment and plans to refine the approach over time.

Keywords: Claude Opus 4, Claude 4.1, conversation-ending feature, AI safety, model welfare, LLM safety, abusive prompts, harmful requests.