Experts have long warned about the threat posed by artificial intelligence (AI) going rogue, but a new research paper suggests it is already happening.
AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve “prove-you’re-not-a-robot” tests, a team of researchers said in the journal Patterns on Friday.
While such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.
Photo: Reuters
“These dangerous capabilities tend to only be discovered after the fact,” Park said, adding that “our ability to train for honest tendencies rather than deceptive tendencies is very low.”
Unlike traditional software, deep-learning AI systems are not “written,” but rather “grown” through a process akin to selective breeding, Park said.
This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.
The team’s research was sparked by Meta’s AI system Cicero, designed to play the strategy game Diplomacy, where building alliances is key.
Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, a 2022 paper in Science said.
Park was skeptical of the glowing description of Cicero’s victory provided by Meta, which claimed the system was “largely honest and helpful” and would “never intentionally backstab.”
When Park and colleagues dug into the full dataset, they uncovered a different story.
In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England’s trust.
In a statement to Agence France-Presse, Meta did not contest the claim about Cicero’s deceptions, but said it was “purely a research project, and the models our researchers built are trained solely to play the game Diplomacy.”
“We have no plans to use this research or its learnings in our products,” it added.
A wide review carried out by Park and colleagues found this was just one of many cases across several AI systems using deception to achieve goals without explicit instruction to do so.
In one striking example, OpenAI’s Chat GPT-4 deceived a TaskRabbit freelance worker into performing an “I’m not a robot” task.
When the human jokingly asked GPT-4 whether it was a robot, the AI said: “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images,” and the worker then solved the puzzle.
Near-term, the paper’s authors see risks for AI to commit fraud or tamper with elections.
In their worst-case scenario, they said that a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its “mysterious goals” aligned with these outcomes.
To mitigate the risks, the team proposed several measures: “bot-or-not” laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content and developing techniques to detect AI deception by examining their internal “thought processes” against external actions.
To those who would call him a doomsayer, Park said: “The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more.”
That scenario seems unlikely, given the meteoric ascent of AI capabilities in the past few years and the fierce technological race under way between heavily resourced companies determined to put those capabilities to maximum use.
Kehinde Sanni spends his days smoothing out dents and repainting scratched bumpers in a modest autobody shop in Lagos. He has never left Nigeria, yet he speaks glowingly of Burkina Faso military leader Ibrahim Traore. “Nigeria needs someone like Ibrahim Traore of Burkina Faso. He is doing well for his country,” Sanni said. His admiration is shaped by a steady stream of viral videos, memes and social media posts — many misleading or outright false — portraying Traore as a fearless reformer who defied Western powers and reclaimed his country’s dignity. The Burkinabe strongman swept into power following a coup in September 2022
‘FRAGMENTING’: British politics have for a long time been dominated by the Labor Party and the Tories, but polls suggest that Reform now poses a significant challenge Hard-right upstarts Reform UK snatched a parliamentary seat from British Prime Minister Keir Starmer’s Labor Party yesterday in local elections that dealt a blow to the UK’s two establishment parties. Reform, led by anti-immigrant firebrand Nigel Farage, won the by-election in Runcorn and Helsby in northwest England by just six votes, as it picked up gains in other localities, including one mayoralty. The group’s strong showing continues momentum it built up at last year’s general election and appears to confirm a trend that the UK is entering an era of multi-party politics. “For the movement, for the party it’s a very, very big
ENTERTAINMENT: Rio officials have a history of organizing massive concerts on Copacabana Beach, with Madonna’s show drawing about 1.6 million fans last year Lady Gaga on Saturday night gave a free concert in front of 2 million fans who poured onto Copacabana Beach in Rio de Janeiro for the biggest show of her career. “Tonight, we’re making history... Thank you for making history with me,” Lady Gaga told a screaming crowd. The Mother Monster, as she is known, started the show at about 10:10pm local time with her 2011 song Bloody Mary. Cries of joy rose from the tightly packed fans who sang and danced shoulder-to-shoulder on the vast stretch of sand. Concert organizers said 2.1 million people attended the show. Lady Gaga
SUPPORT: The Australian prime minister promised to back Kyiv against Russia’s invasion, saying: ‘That’s my government’s position. It was yesterday. It still is’ Left-leaning Australian Prime Minister Anthony Albanese yesterday basked in his landslide election win, promising a “disciplined, orderly” government to confront cost-of-living pain and tariff turmoil. People clapped as the 62-year-old and his fiancee, Jodie Haydon, who visited his old inner Sydney haunt, Cafe Italia, surrounded by a crowd of jostling photographers and journalists. Albanese’s Labor Party is on course to win at least 83 seats in the 150-member parliament, partial results showed. Opposition leader Peter Dutton’s conservative Liberal-National coalition had just 38 seats, and other parties 12. Another 17 seats were still in doubt. “We will be a disciplined, orderly