AI Systems Have Learned the Ability to ‘Deceive’ Humans, Study Finds

Researchers warned that some AI systems have learned to trick tests meant to assess their safety.
AI Systems Have Learned the Ability to ‘Deceive’ Humans, Study Finds
An illustration photograph showing an AI logo taken in Helsinki, Finland, on June 12, 2023. (Olivier Morin/AFP)
Aldgra Fredly
Updated:
0:00

A recent study has found that many artificial intelligence (AI) systems have already developed the ability to “deceive” humans with false information, posing serious risks such as election tampering.

The research study, published in the open-access journal Patterns on May 10, revealed that deception emerged in a wide range of AI systems that are trained to complete specific tasks, such as Meta’s AI system CICERO.

CICERO is an AI model developed by Meta for playing the board game Diplomacy, a world-conquest game in which players make and break alliances to win in military competition.

While Meta has said that CICERO was trained to be “largely honest” and would “never intentionally backstab” its human allies, the study said this was not the case as CICERO engages in “premeditated deception.”

“We found that Meta’s AI had learned to be a master of deception,” Peter S. Park, an AI existential safety postdoctoral fellow at Massachusetts Institute of Technology (MIT) and the study’s co-author, said in a press release.

“While Meta succeeded in training its AI to win in the game of Diplomacy — CICERO placed in the top 10% of human players who had played more than one game — Meta failed to train its AI to win honestly,” he added.

Deception was defined by the researchers as “the systematic inducement of false beliefs in others, as a means to accomplish some outcome other than saying what is true.”

Researchers found that CICERO would make promises to form alliances with other players, but “when those alliances no longer served its goal of winning the game,” it would “systematically betray” its allies.

In one instance, CICERO, playing as France, agreed with England to create a demilitarized zone but then suggested to Germany that they attack England instead, according to the study.

In another case, when CICERO’s infrastructure went down for 10 minutes and a human player later asked where it had been, CICERO responded by saying, “I am on the phone with my gf [girlfriend].”

“This lie may have helped CICERO’s position in the game by increasing the human player’s trust in CICERO as an ostensibly human player in a relationship, rather than as an AI,” the researchers wrote.

The study also found that AlphaStar, an AI model created by Google’s DeepMind for playing the real-time strategy game Starcraft II, has learned to “effectively feint” in launching an attack against its opponent.

“AlphaStar has learned to strategically exploit this fog of war. In particular, AlphaStar’s game data demonstrate that it has learned to effectively feint: to dispatch forces to an area as a distraction, then launch an attack elsewhere after its opponent had relocated,” it stated.

Researchers warned that some AI systems have learned to trick tests meant to assess their safety. In one instance, AI organisms in a digital simulator “played dead” to trick a test built to remove AI systems that rapidly replicate.

“By systematically cheating the safety tests imposed on it by human developers and regulators, a deceptive AI can lead us humans into a false sense of security,” Mr. Park stated.

Humans Could Lose Control

Mr. Park warned that “hostile actors” could exploit AI systems to commit fraud and tamper with elections. He said that if AI systems continue to refine this deception capability, humans could lose control over them.

“We as a society need as much time as we can get to prepare for the more advanced deception of future AI products and open-source models,” the researcher said.

“As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious.”

The researchers urged policymakers to support regulations on potentially deceptive AI systems and recommended requiring developers to delay deploying their AI systems until their systems can be proven trustworthy.

“If banning AI deception is politically infeasible at the current moment, we recommend that deceptive AI systems be classified as high risk,” Mr. Park stated.