Tech giant Meta has developed an AI tool called Cicero (stylized CICERO) which has proven so clever that it is capable of deceiving humans and defeating them in strategic board games.

The study was conducted by researchers at the Center for AI Safety in San Francisco and published in open-access journal Patterns. It argues that the philosophical debate over the actual sentience and emotional capacity of AI is not as important as evidence they are becoming powerful enough to lie and actually fool humans.

Meta officially touts CICERO as an “exceptional” team player in a board game it used to test its behavior, with a great amount of “patience, focus and empathy.” Its “use of honesty” and its ability to make use of its “relationships with other players” to check the powers of its allies makes it a strong contender for the title of artificial intelligence.

Learn the benefits of becoming a Valuetainment Member and subscribe today!

But the researchers have reason to believe that Meta suppressed the less comfortable implications of the data. While Meta’s intention was to develop CICERO to be “largely honest and helpful to its speaking partners” and claimed it would “never intentionally backstab” its allies, the AI “turned out to be an expert liar.” The tool “not only betrayed other players but also engaged in premeditated deception, planning in advance to build a fake alliance with a human player in order to trick that player into leaving themselves undefended for an attack,” the researchers found.

They went on to argue that proves AI is a great danger: “Large language models and other AI systems have already learned, from their training, the ability to deceive via techniques such as manipulation, sycophancy, and cheating the safety test. AI’s increasing capabilities at deception pose serious risks, ranging from short-term risks, such as fraud and election tampering, to long-term risks, such as losing control of AI systems.”

Meta describes CICERO as “the first AI agent to achieve human-level performance in the complex natural language strategy game Diplomacy.” As the company explains, Diplomacy is a “board game that can be described as a combination of the board game Risk, the card game poker, and the TV show Survivor.” The game starts in 1901 and each player controls of one of several European powers, with the goal being to take over half of the board. To win, players must cooperate, make deals, support each other—and deceive.

As the researchers explain in the first of three cases of deception:

“we see a case of premeditated deception, where CICERO makes a commitment that it never intended to keep. Playing as France, CICERO conspired with Germany to agree to a “Sealion” alliance against England. After deciding with Germany to invade the North Sea, CICERO told England that it would cooperate with them and support them in moving away from the North Sea to Belgium. Once England was convinced that CICERO was supporting it, CICERO reported back to Germany. Notice that this example cannot be explained in terms of CICERO changing its mind as it goes, because it only made an alliance with England in the first place after planning with Germany to betray England. At the end of the turn, CICERO attacked England in Belgium instead of supporting it.”

CICERO was ultimately able to achieve more than double the average score of human players of the board game.


Shane Devine is a writer covering politics and business for VT and a regular guest on The Unusual Suspects. Follow Shane’s work here.

Add comment