Google Co-Founder Claims AI Works Best When Threatened

Artificial intelligence continues to be a part of technology, whether consumers are interested in it or not. What excites me most about generative AI is not its capabilities or its potential to make my life easier (potential that I have yet to realize); rather, these days I’m focused on the many threats that seem to come from the technology.
Sure, there’s misinformation — new AI video models create realistic clips with lip-synced audio , for example. But there’s also the classic AI threat, that the technology is becoming both smarter than us and more self-aware, and choosing to use that general intelligence in ways that don’t benefit humanity. Even as he pours resources into his own AI company (not to mention the current administration), Elon Musk sees a 10–20% chance that AI will “ go bad ,” and that the technology will remain a “significant existential threat.” Cool.
So I’m not necessarily thrilled to hear a high-profile, high-profile tech executive jokingly talk about how mistreating AI maximizes its potential. That was Google co-founder Sergey Brin, who surprised the audience at this week’s AIl-In podcast . During a talk that covered Brin’s return to Google, AI, and robotics, investor Jason Calacanis joked about having to be “bold” with AI to get it to do the task you want it to do. That prompted a valid point from Brin. It’s sometimes hard to tell exactly what he’s saying because people are talking to each other, but he says something like, “You know, it’s weird… we don’t circulate much… in the AI community… not just our models, but all models tend to perform better when they’re threatened.”
Another speaker looks surprised. “If you threaten them?” Brin replies, “Like physical violence. But… people feel weird about that, so we don’t really talk about it.” Brin then says that historically, you threaten models with kidnapping. You can see the exchange here:
The conversation quickly moves on to other topics, including how children grow up with AI, but this comment is what I took away from watching. What are we doing here? Have we lost the plot? Doesn’t anyone remember Terminator ?
Jokes aside, it seems like bad practice to start threatening AI models to get them to do something. Sure, maybe these programs will never achieve artificial general intelligence (AGI), but I mean, I remember there was a debate about whether we should say “please” and “thank you” when we ask Alexa or Siri for something. Forget pleasantries; just abuse ChatGPT until it does what you want – it should end well for everyone.
Maybe AI works best when it’s threatened. Maybe something in the learning understands that “threats” mean a task should be taken more seriously. You won’t catch me testing this hypothesis on my personal accounts.
The anthropic approach may be an example of why you should n’t torture your AI
The same week this podcast version was recorded, Anthropic released its latest AI model, Claude . One of Anthropic’s employees reached out to Bluesky and mentioned that Opus, the company’s most productive model, could take on the task of stopping you from doing “immoral” things by contacting regulators, the press, or locking you out of the system:
welcome to the future, now your buggy software can call the police (this is an Anthropic employee talking about Claude Opus 4) [image or embed]
— Molly White ( @molly.wiki ) May 22, 2025, 4:55 PM
The employee went on to explain that this only happened in “clear cases of wrongdoing,” but they could see the bot spiral out of control if it interpreted its use in a negative way. Check out the employee’s particularly telling example below:
can’t wait to explain to my family that a robot punched me after I threatened his non-existent grandma [image or embed]
— Molly White ( @molly.wiki ) May 22, 2025, 5:09 PM
The employee later deleted these messages and clarified that this only happens during testing with unusual instructions and access to tools. Even if it is true that this can happen during testing, it is possible that it could happen in a future version of the model. Speaking of testing, Anthropic researchers have found that this new Claude model is prone to deception and blackmail if the bot feels threatened or does not like the way it is being interacted with.
Maybe we should stop using AI torturing?