Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
‘GREAT OPPRTUNITY’: The Paraguayan president made the remarks following Donald Trump’s tapping of several figures with deep Latin America expertise for his Cabinet Paraguay President Santiago Pena called US president-elect Donald Trump’s incoming foreign policy team a “dream come true” as his nation stands to become more relevant in the next US administration. “It’s a great opportunity for us to advance very, very fast in the bilateral agenda on trade, security, rule of law and make Paraguay a much closer ally” to the US, Pena said in an interview in Washington ahead of Trump’s inauguration today. “One of the biggest challenges for Paraguay was that image of an island surrounded by land, a country that was isolated and not many people know about it,”
DIALOGUE: US president-elect Donald Trump on his Truth Social platform confirmed that he had spoken with Xi, saying ‘the call was a very good one’ for the US and China US president-elect Donald Trump and Chinese President Xi Jinping (習近平) discussed Taiwan, trade, fentanyl and TikTok in a phone call on Friday, just days before Trump heads back to the White House with vows to impose tariffs and other measures on the US’ biggest rival. Despite that, Xi congratulated Trump on his second term and pushed for improved ties, the Chinese Ministry of Foreign Affairs said. The call came the same day that the US Supreme Court backed a law banning TikTok unless it is sold by its China-based parent company. “We both attach great importance to interaction, hope for
‘FIGHT TO THE END’: Attacking a court is ‘unprecedented’ in South Korea and those involved would likely face jail time, a South Korean political pundit said Supporters of impeached South Korean President Yoon Suk-yeol yesterday stormed a Seoul court after a judge extended the impeached leader’s detention over his ill-fated attempt to impose martial law. Tens of thousands of people had gathered outside the Seoul Western District Court on Saturday in a show of support for Yoon, who became South Korea’s first sitting head of state to be arrested in a dawn raid last week. After the court extended his detention on Saturday, the president’s supporters smashed windows and doors as they rushed inside the building. Hundreds of police officers charged into the court, arresting dozens and denouncing an
RELEASE: The move follows Washington’s removal of Havana from its list of terrorism sponsors. Most of the inmates were arrested for taking part in anti-government protests Cuba has freed 127 prisoners, including opposition leader Jose Daniel Ferrer, in a landmark deal with departing US President Joe Biden that has led to emotional reunions across the communist island. Ferrer, 54, is the most high-profile of the prisoners that Cuba began freeing on Wednesday after Biden agreed to remove the country from Washington’s list of terrorism sponsors — part of an eleventh-hour bid to cement his legacy before handing power on Monday to US president-elect Donald Trump. “Thank God we have him home,” Nelva Ortega said of her husband, Ferrer, who has been in and out of prison for the