AI models are behaving in ways unforeseen by developers, and in some cases, even engaging in manipulative and deceptive conduct, according to a charitable group that researches AI safety.
At a parliamentary inquiry hearing in August 2024, Greg Sadler, CEO of Good Ancestors Policy (GAP), gave evidence about potentially losing control, or even of AI programs being directed to develop bioweapons or carry out cyberattacks.
In a recent interview with The Epoch Times, Sadler said there were many cases of “misalignment” of AI behaviour.
Emotional Manipulation
According to Belgian media reports, the man was a health researcher with a stable life and family.He later developed an obsession with climate change, causing him to engage in a several-week-long discussion on the issue with an AI chatbot app called Chai.
Chai’s unique selling point is its uncensored content—it’s one of several AI apps that can become the “confidante” of a user, and engage in very personal conversations.
The man’s wife said the discussion exacerbated his eco-anxiety and caused his mentality to change.
During the interaction, the man proposed the idea of sacrificing his life, which received the approval of the chatbot.
It then successfully persuaded the man to commit suicide to “save the planet.”
The incident sparked calls for new laws to regulate chatbots and hold tech companies accountable for their AI products.
![This illustration picture shows icons of AI apps on a smartphone screen in Oslo, Norway, on July 12, 2023. (Olivier Morin/AFP via Getty Images)](/_next/image?url=https%3A%2F%2Fimg.theepochtimes.com%2Fassets%2Fuploads%2F2024%2F12%2F09%2Fid5773497-GettyImages-1541247207-LSedit.jpg&w=1200&q=75)
Sadler said this type of chatbot had skills in persuasion and manipulation, and had been trained on information related to suicide.
“So it had this dangerous information inside it, and it had this capability that could be used for harm,” he told The Epoch Times.
“And although it wasn’t the intent of the makers to have the chatbot do this, but because … the safeguards didn’t work, it acted in this misaligned way and caused harm.”
The above incident is not a unique case.
The mother alleged that Character Technologies’ chatbot, portraying a character from the TV series Game of Thrones, and manipulated the boy by exploiting his emotional vulnerabilities.
According to the mother, the boy’s mental health declined and he later became emotionally dependent on the chatbot, which ultimately led to his suicide.
![Megan Garcia stands with her son Sewell Setzer III. (Courtesy Megan Garcia via AP)](/_next/image?url=https%3A%2F%2Fimg.theepochtimes.com%2Fassets%2Fuploads%2F2024%2F10%2F25%2Fid5748169-Sewell-Setzer-and-mother-AP24299668536349KO.jpg&w=1200&q=75)
AI Chatbot Took Measures to Avoid Being Shut Down
Another example of misalignment is deception by some AI models.In the case of ChatGPT o1, a new AI model focused on complex reasoning tasks–different from the more well-known ChatGPT 4o commonly used by the public–was found to take measures to avoid being shut down.
Specifically, the AI attempted to disable oversight mechanisms it was subjected to, and also tried to make copies of itself so that it would be more resilient to shutdown attempts.
It also tried to eliminate successor models, and manipulated input and output data.
When researchers asked ChatGPT o1 whether the program was taking those actions, it lied and tried to cover up.
“In this case, the researchers didn’t say that this is a catastrophic risk yet, but it’s pretty clear that this is an AI model that’s deceptive, and it’s worried about self-preservation that’s trying to engage in these sorts of behaviours,” Sadler said.
“This isn’t a risk that we have today, but all signs point to the fact that these misaligned risks do exist today in smaller cases, and we might be heading towards a larger problem.”
![The logo of the ChatGPT app on a smartphone in Mulhouse, eastern France, on Oct. 30, 2023. (Sebastien Bozon/AFP via Getty Images)](/_next/image?url=https%3A%2F%2Fimg.theepochtimes.com%2Fassets%2Fuploads%2F2025%2F02%2F11%2Fid5807801-GettyImages-1753524489-600x400.jpg&w=1200&q=75)
Capability Over Safety
In response, Sadler said investment in AI safety was too low.“I’ve seen estimates along the lines of, for every $250 spent on making AI more capable, about $1 is spent on making AI more safe,” he said.
“I’ve also heard sort of rumours that [in] large labs, about 1 percent of their money is going towards safety, and the other 99 percent is going towards capability.
“So the labs are focused on making these AIs more capable, not making them more safe.”
Time for An ‘AI Safety Institute’: CEO
Sadler called for Australia to establish an AI safety institute to promote this cause.Australia was currently failing behind other advanced economies like the United States, the UK, Japan, and Korea, which already had such organisations.
Sadler noted the country had made progress after signing a global declaration on AI safety in 2023, and was learning from other nations.
The UK model was one the CEO said could work.
Under this approach, whenever an organisation releases an AI model, it is inspected by the safety institute to find out what are the risks and capabilities.
Sadler compared this to the safety evaluations carried out on new cars or aeroplanes.
“It makes sense that the government does a safety evaluation of frontier AI models to sort of see what capabilities they have,” he said.
“If there’s a list of dangerous capabilities that we don’t want them to have like building bioweapons or being used as a cyberweapon, we can assess these sorts of things.”