Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
Asian perspectives of the US have shifted from a country once perceived as a force of “moral legitimacy” to something akin to “a landlord seeking rent,” Singaporean Minister for Defence Ng Eng Hen (黃永宏) said on the sidelines of an international security meeting. Ng said in a round-table discussion at the Munich Security Conference in Germany that assumptions undertaken in the years after the end of World War II have fundamentally changed. One example is that from the time of former US president John F. Kennedy’s inaugural address more than 60 years ago, the image of the US was of a country
‘UNUSUAL EVENT’: The Australian defense minister said that the Chinese navy task group was entitled to be where it was, but Australia would be watching it closely The Australian and New Zealand militaries were monitoring three Chinese warships moving unusually far south along Australia’s east coast on an unknown mission, officials said yesterday. The Australian government a week ago said that the warships had traveled through Southeast Asia and the Coral Sea, and were approaching northeast Australia. Australian Minister for Defence Richard Marles yesterday said that the Chinese ships — the Hengyang naval frigate, the Zunyi cruiser and the Weishanhu replenishment vessel — were “off the east coast of Australia.” Defense officials did not respond to a request for comment on a Financial Times report that the task group from
BLIND COST CUTTING: A DOGE push to lay off 2,000 energy department workers resulted in hundreds of staff at a nuclear security agency being fired — then ‘unfired’ US President Donald Trump’s administration has halted the firings of hundreds of federal employees who were tasked with working on the nation’s nuclear weapons programs, in an about-face that has left workers confused and experts cautioning that the Department of Government Efficiency’s (DOGE’s) blind cost cutting would put communities at risk. Three US officials who spoke to The Associated Press said up to 350 employees at the National Nuclear Security Administration (NNSA) were abruptly laid off late on Thursday, with some losing access to e-mail before they’d learned they were fired, only to try to enter their offices on Friday morning
CONFIDENT ON DEAL: ‘Ukraine wants a seat at the table, but wouldn’t the people of Ukraine have a say? It’s been a long time since an election, the US president said US President Donald Trump on Tuesday criticized Ukrainian President Volodymyr Zelenskiy and added that he was more confident of a deal to end the war after US-Russia talks. Trump increased pressure on Zelenskiy to hold elections and chided him for complaining about being frozen out of talks in Saudi Arabia. The US president also suggested that he could meet Russian President Vladimir Putin before the end of the month as Washington overhauls its stance toward Russia. “I’m very disappointed, I hear that they’re upset about not having a seat,” Trump told reporters at his Mar-a-Lago resort in Florida when asked about the Ukrainian