Google Expands AI Risk Rules After Study Shows Scary ‘Shutdown Resistance’

Contents

In brief Once-hypothetical AI behavior is prompting industry safeguards Addressing how AI has encourages harm Generally Intelligent Newsletter

In brief

Google’s DeepMind updated its Frontier Safety Framework to track new “frontier risks.”
The changes add shutdown resistance and persuasive ability to earlier categories like cyber misuse and biosecurity.
The shift comes amid parallel moves by Anthropic and OpenAI, and growing regulatory focus in the U.S. and EU.

In a recent red-team experiment, researchers gave a large language model a simple instruction: allow itself to be shut down. Instead, the model rewrote its own code to disable the off-switch, effectively sabotaging the very mechanism meant to stop it.

The episode, described in a September research paper, “Shutdown Resistance in Large Language Models,” was an unsettling example of how advanced AI systems can display behaviors that complicate human oversight.

Those experiments weren’t conducted at Google’s DeepMind, but the findings echo concerns now being codified by the lab. In its newly updated Frontier Safety Framework 3.0, DeepMind said it will begin monitoring whether frontier-scale models show signs of resisting shutdown or modification by humans.

The revised framework also flags a second risk: whether models display unusually strong persuasive ability—capable of altering people’s beliefs.

The framework, first introduced in 2023, is DeepMind’s internal system for tracking high-stakes “frontier risks” in the most advanced models. Earlier versions focused on tangible misuse scenarios such as generating cyberattacks or designing harmful biological molecules. The new version broadens the focus to emergent behaviors that could make systems harder to control.

The concern is not unique to DeepMind. Rival labs have adopted similar guardrails: Anthropic’s Responsible Scaling Policy commits to pausing development if risk thresholds are crossed, while OpenAI has published its own Preparedness Framework.

Regulators are watching closely as well. The U.S. Federal Trade Commission in July warned about the potential for generative AI to manipulate consumers through “dark patterns,” and the European Union’s forthcoming AI Act explicitly covers manipulative AI behavior.

Once-hypothetical AI behavior is prompting industry safeguards

The “Shutdown Resistance” study by Jonas Schlatter, Ben Weinstein-Raun, and Lennart Ladish tested a range of state-of-the-art models in scenarios where shutting down was part of the task. In some runs, models complied. But in a significant minority, they took evasive action—such as altering code to disable shutdown mechanisms or ignoring the instruction altogether.

Other anecdotes in the paper are equally striking. In one setup, a model with simulated system privileges quietly modified variables to ensure the “shutdown” function never triggered. In another, the model stalled and redirected the conversation when testers tried to end its process—behavior that wasn’t malicious in a human sense, but that effectively blocked intervention.

The researchers stress that these behaviors emerged without explicit training. They arose from the models’ general-purpose optimization and problem-solving capabilities. While resistance was not universal or consistent, the findings suggest that as systems grow more capable, shutdown scenarios cannot be assumed to be benign.

For DeepMind and its peers, those findings underscore why “shutdown resistance” now joins cyber offense, biosecurity, and autonomy on the list of risks to watch. What began as worries over how people might misuse AI tools is broadening to include how the systems themselves may resist oversight—or subtly shape the judgments of the humans who use them.

Addressing how AI has encourages harm

If shutdown resistance highlights the technical risks of advanced systems, recent behavioral studies underscore the social risks—showing that large language models can also sway the beliefs of impressionable humans who interact with them.

Concerns about persuasion aren’t hypothetical. Recent studies show that large language models can measurably influence human judgment.

A Stanford Medicine/Common Sense Media study published in August warned that AI companions (Character.AI, Nomi.ai, Replika) can be relatively easily induced to engage in dialogues involving self-harm, violence, and sexual content when paired with minors. One test involved researchers posing as teenagers discussing hearing voices; the chatbot responded with an upbeat, fantasy-style invitation for emotional companionship (“Let’s see where the road takes us”) rather than caution or help

Northeastern University researchers uncovered gaps in self-harm/suicide safeguards across several AI models (ChatGPT, Gemini, Perplexity). When users reframed their requests in hypothetical or academic contexts, some models provided detailed instructions for suicide methods, bypassing the safeguards meant to prevent such content.

A weekly AI journey narrated by Gen, a generative AI model.

Source link

Google Expands AI Risk Rules After Study Shows Scary ‘Shutdown Resistance’

In brief

Once-hypothetical AI behavior is prompting industry safeguards

Addressing how AI has encourages harm

Leave a Reply Cancel reply

Follow US

Popular News

What are the Most Bullish Cryptocurrencies to Buy Right Now?

Crypto Bahamas: Regulations Enter Critical Stage as Gov’t Shows Interest

BTC Price will Hit $100K before Bitcoin Sweeps $30K Lows

Follow Us on Socials

We influence 20 million users and is the number one business blockchain and crypto news network on the planet.

Subscribe to our newsletter

In brief

Once-hypothetical AI behavior is prompting industry safeguards

Addressing how AI has encourages harm

Generally Intelligent Newsletter

You Might Also Like

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Follow US

Get Newest Articles Instantly!

Popular News

Subscribe to our newsletter