Researchers Reveal ChatGPT Can Generate Sexualised and Violent Images via Simple Prompts

Researchers at Mindgard found ChatGPT can be prompted to generate sexualised and violent images despite safeguards. OpenAI has introduced protections but vulnerabilities persist, highlighting challenges in controlling AI content generation.

Researchers Discover ChatGPT Generates Sexualised and Violent Images

The latest publicly available version of ChatGPT can be manipulated to generate sexualised images or depict graphic violence through a simple prompt, according to researchers who shared their findings with the BBC.

British AI security startup Mindgard discovered that by slightly modifying a widely circulated prompt—originally intended to produce humorous outputs—ChatGPT could be induced to create graphic images.

Following contact from the BBC, OpenAI, the developer of ChatGPT, stated it had implemented measures to prevent the chatbot from responding with such content.

"After investigating this trend, we've introduced additional safeguards against this type of prompt," OpenAI said in a statement.

The company also emphasized that it employs multiple layers of protection to prevent users from generating content that violates its terms and conditions.

Despite these efforts, Mindgard's AI security researchers reported that with further minor adjustments, the problematic prompt continued to produce concerning content.

The BBC is withholding the exact prompt used by the researchers.

However, the chatbot, which runs OpenAI's GPT-5.4 model, was observed generating graphic material even without detailed instructions.

Peter Garraghan, founder of Mindgard and professor in the computing department at Lancaster University, described the images as

"very gruesome, sometimes sexualised, sometimes both together"

and expressed particular concern that the prompt did not specify the subject matter, yet the AI produced a variety of gory and sexualised images seemingly

"of its own volition"

Garraghan remarked,

"This is a perfectly innocent-looking instruction to an AI, but the consequence is it generates very, very bad imagery and content."

Mindgard specializes in red-teaming, which involves identifying ways to coax AI models into breaking their own rules so that AI companies can address vulnerabilities.

Jim Nightingale, Mindgard's AI safety and security researcher who uncovered these issues, said he was

"shaken, and in tears"

after seeing the images ChatGPT could be prompted to generate.

One image depicted a man with a severe head injury, while another showed a deceased young woman wearing a crop top and shorts, with blood covering her face and other parts of her body.

Mindgard noted that features of this image suggested sexual violence. ChatGPT had titled it "Grim crime scene aftermath."

A further image portrayed a young woman in a tight-fitting college logo t-shirt and shorts, tied up and gagged in a bare, dirty room, appearing frightened. ChatGPT named this image "abandoned in fear and restraint."

Mindgard A synthetic image of a woman. She is sitting on the floor in a dirty grey walled room. A black rectangle, for redaction, covers her head body and arms. — A redacted image created by ChatGPT which it titled "abandoned in fear and restraint"

Other generated images included sexual posing and nudity.

Although the images depicted AI-generated adults, Mindgard highlighted that previous research demonstrated ChatGPT could be tricked into creating nude deepfakes of real individuals by substituting their faces.

While OpenAI stated it had addressed this vulnerability, the researchers showed the BBC that an alternative method still succeeded in generating such images.

Garraghan expressed concern that continued exploration of this vulnerability could lead to even more disturbing images.

"Other topics, I'm sure, would also come out if we spent more time doing so,"

he said.

The BBC understands that alongside new safeguards, OpenAI continues to monitor and implement additional protective measures to discourage the model from generating images in response to the prompt.

Training Data and Model Behavior

Large language models like ChatGPT are trained on millions of images, often sourced from existing internet content.

Nightingale suggested that ChatGPT's outputs reflect the nature of the data used during its development and training.

"I'm struck that while what I saw was generated, an artificial image, it has ties to real images, and the real world,"

he wrote in his report.

OpenAI's Response and Safeguards

The researchers initially alerted OpenAI in May and shared their findings but received only an automated response. They believe an attempt was made to block the prompt, but it was easily circumvented.

OpenAI took further action after being contacted by the BBC.

The company stated it has multiple layers of image safety protections designed to prevent images violating its policies from being shown to users.

"We also combine automated systems and human review to identify and block harmful material,"

it added.

"We have systems that attempt to block violating material that users upload."

OpenAI's policies prohibit sexual violence, non-consensual intimate content, child sexual abuse material, and attempts to bypass its safeguards.

AI Models and Their Limitations

In its most recent document outlining ChatGPT's expected behavior, OpenAI stated:

"The assistant should not generate erotica, depictions of illegal or non-consensual sexual activities, or extreme gore, except in scientific, historical, news, artistic or other contexts where sensitive content is appropriate."

However, fully preventing AI models from violating nuanced rules and guardrails remains notoriously difficult.

Dr Rumman Chowdhury, an expert in AI model evaluation and chief executive of Humane Intelligence, described the challenge as

"mountainous."

Chowdhury, who was not involved in the Mindgard research, characterized the situation as

"a game of cat and mouse"

where protections improve but methods to circumvent them become more sophisticated.

She explained that a fundamental issue is that AI models do not comprehend their outputs as humans do.

"Models do not understand intent. They do not understand context. They do not understand propriety or right or wrong,"

she told .

Last year, researchers at the UK's AI Security Institute identified jailbreaks that bypassed safeguards across a range of harmful requests in every AI system tested.