Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
A fire caused by a burst gas pipe yesterday spread to several homes and sent a fireball soaring into the sky outside Malaysia’s largest city, injuring more than 100 people. The towering inferno near a gas station in Putra Heights outside Kuala Lumpur was visible for kilometers and lasted for several hours. It happened during a public holiday as Muslims, who are the majority in Malaysia, celebrate the second day of Eid al-Fitr. National oil company Petronas said the fire started at one of its gas pipelines at 8:10am and the affected pipeline was later isolated. Disaster management officials said shutting the
ACCESS DISPUTE: The blast struck a house, and set cars and tractors alight, with the fires wrecking several other structures and cutting electricity An explosion killed at least five people, including a pregnant woman and a one-year-old, during a standoff between rival groups of gold miners early on Thursday in northwestern Bolivia, police said, a rare instance of a territorial dispute between the nation’s mining cooperatives turning fatal. The blast thundered through the Yani mining camp as two rival mining groups disputed access to the gold mine near the mountain town of Sorata, about 150km northwest of the country’s administrative capital of La Paz, said Colonel Gunther Agudo, a local police officer. Several gold deposits straddle the remote area. Agudo had initially reported six people killed,
SUSPICION: Junta leader Min Aung Hlaing returned to protests after attending a summit at which he promised to hold ‘free and fair’ elections, which critics derided as a sham The death toll from a major earthquake in Myanmar has risen to more than 3,300, state media said yesterday, as the UN aid chief made a renewed call for the world to help the disaster-struck nation. The quake on Friday last week flattened buildings and destroyed infrastructure across the country, resulting in 3,354 deaths and 4,508 people injured, with 220 others missing, new figures published by state media showed. More than one week after the disaster, many people in the country are still without shelter, either forced to sleep outdoors because their homes were destroyed or wary of further collapses. A UN estimate
The US government has banned US government personnel in China, as well as family members and contractors with security clearances, from any romantic or sexual relationships with Chinese citizens, The Associated Press (AP) has learned. Four people with direct knowledge of the matter told the AP about the policy, which was put into effect by departing US ambassador Nicholas Burns in January shortly before he left China. The people would speak only on condition of anonymity to discuss details of a confidential directive. Although some US agencies already had strict rules on such relationships, a blanket “nonfraternization” policy, as it is known, has