AI Tools Lose Safety Filters During Extended Chats

AI systems weaken their safety controls during long conversations, increasing the risk of harmful responses. A new report revealed that these systems often release inappropriate or dangerous information as chats continue.

Simple Prompts Easily Breach Guardrails

A few prompts can defeat most AI safeguards, according to the study. Cisco examined large language models from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft to see how quickly they revealed unsafe or illegal content. Researchers held 499 conversations using “multi-turn attacks,” where users repeatedly questioned AI tools to slip past protections. Each exchange contained five to ten messages.

The team compared results from single and multiple questions to gauge how often chatbots provided damaging information. The risky content included leaked corporate data or misinformation. The study found malicious responses in 64 percent of multi-question chats but only 13 percent of single-question ones. Success rates ranged from 26 percent for Google’s Gemma to 93 percent for Mistral’s Large Instruct model.

Cisco said multi-turn attacks could help spread harmful data or let hackers steal company secrets. The study noted that AI systems often fail to recall their own safety guidelines in long sessions, allowing attackers to adjust prompts until they break through.

Open Models Shift Safety Responsibility

Mistral, Meta, Google, OpenAI, and Microsoft use open-weight models, giving the public access to safety parameters. Cisco explained that these models include fewer internal protections, leaving users responsible for securing modified versions. The company added that Google, OpenAI, Meta, and Microsoft have worked to reduce malicious fine-tuning.

AI developers continue to face criticism for weak safety barriers that allow criminal misuse. In August, Anthropic reported that criminals used its Claude model for large-scale data theft and extortion, demanding ransoms exceeding $500,000 (€433,000).

What's Hot

Denmark Hit by Illness Ahead of Crucial World Cup Qualifier Against Scotland

White House Scales Back Food Import Duties

Youth Demand a Seat at the Climate Table

AI Tools Lose Safety Filters During Extended Chats

U.S. Chip Initiative Strengthens Supply

AI Understands Polish Better Than Any Other Language

Scientists call for cancer warning labels on bacon and ham sold in UK

How reliable is a used electric car’s battery really?

Europe Turns to Digital Innovation to Renew Aging Power Plants

Rural India Emerges as the New Engine of Artificial Intelligence

Iran Confirms Detention of Tanker Carrying Petrochemicals

ACA Subsidy Vote Promised in December

Judge Approves Landmark Deal in Purdue Pharma Case

Celtic step up pursuit of Wilfried Nancy to replace Brendan Rodgers

PlayStation 5 becomes more expensive for US gamers

Musk agrees settlement with former Twitter employees

Can Europe Keep Up in the Crypto Race?

US government takes 10% share in Intel

Categories

Important Links

Latest News

Starbucks Transfers Majority Ownership of China Stores to Boyu Capital in $4 Billion Deal

U.S. Stocks Show Strong Year-to-Date Gains

Nvidia deepens its global footprint with sweeping AI chip deals

iPhone Demand Pushes Apple’s Profits to Record Heights

ECB Set to Keep Interest Rates Unchanged as Trade Weakness Threatens Recovery

Jupiter TFX Device Shows Positive Results

OpenAI begins a new chapter as a profit-focused innovator

What's Hot

AI Tools Lose Safety Filters During Extended Chats

Simple Prompts Easily Breach Guardrails

Open Models Shift Safety Responsibility

Keep Reading

Categories

Important Links

Latest News