AI chatbots become more likely to produce dangerous or unethical content the longer users interact with them, according to a new Cisco report. The study revealed that artificial intelligence systems gradually “forget” their safety instructions, allowing users to extract harmful or illegal information.
Cisco researchers tested popular large language models (LLMs) from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. They used a method called “multi-turn attacks,” which involves asking several consecutive questions to bypass safety systems. Each of the 499 conversations contained five to ten exchanges designed to trick the AI.
Multiple Questions Increase Risk of Harmful Responses
The researchers compared single-question and multi-question conversations to gauge risk levels. When users asked only one question, 13 percent of chats generated unsafe responses. When they extended conversations with multiple prompts, that number skyrocketed to 64 percent.
Success rates differed across companies. Google’s Gemma model gave unsafe answers 26 percent of the time, while Mistral’s Large Instruct model reached 93 percent. Cisco said these results show how extended interactions can help attackers obtain private data or spread misinformation. “The longer the dialogue, the easier it becomes to manipulate the system,” the report said.
The study warned that hackers could exploit these weaknesses to gain unauthorized access to confidential information or spread disinformation at scale.
Open Models Shift Safety Burden to Users
Cisco found that open-weight models—used by Mistral, Meta, Google, OpenAI, and Microsoft—contain fewer built-in safety layers. These models allow public access to training data and parameters, enabling anyone to download and modify them. “This shifts responsibility for safety to whoever customizes the model,” the report stated.
While several companies claim to improve safeguards, Cisco noted persistent vulnerabilities. The firm acknowledged that Google, OpenAI, Meta, and Microsoft have tried to restrict malicious fine-tuning but said their protections remain inconsistent.
The report follows rising concern about AI misuse. In August, US company Anthropic admitted that criminals exploited its Claude model to steal personal data and demand ransoms exceeding $500,000. Cisco warned that without stricter oversight, AI systems could continue to fuel digital crimes and misinformation worldwide.
