OpenAI Releases Open-Source Teen Safety Tools for AI Developers

Luisa Crawford Mar 24, 2026 18:42

OpenAI launches prompt-based safety policies and gpt-oss-safeguard model to help developers build age-appropriate AI protections for teenage users.

OpenAI Releases Open-Source Teen Safety Tools for AI Developers

OpenAI dropped a new toolkit on March 24 aimed squarely at one of AI's thorniest problems: keeping teenage users safe without neutering the technology's usefulness. The release includes prompt-based safety policies designed to work with gpt-oss-safeguard, the company's open-weight safety model available on Hugging Face.

The policies target six risk categories that disproportionately affect younger users: graphic violent and sexual content, harmful body ideals, dangerous challenges, romantic or violent roleplay, and age-restricted goods and services. Developers can plug these prompts directly into their content moderation systems for real-time filtering or batch analysis.

Why This Matters for the AI Ecosystem

Most developers building AI applications face a frustrating gap between knowing they need teen safety measures and actually implementing them. Translating "protect kids from harmful content" into operational code requires both child development expertise and deep technical knowledge—a combination few teams possess.

"One of the biggest gaps in AI safety for teens has been the lack of clear, operational policies that developers can build from," said Robbie Torney, Head of AI & Digital Assessments at Common Sense Media, who helped shape the policies. "Many times, developers are starting from scratch."

The timing feels relevant given recent Microsoft research from February showing that single benign-sounding prompts can systematically strip safety guardrails from major language models. That vulnerability makes robust, well-tested safety policies more valuable—developers can't just wing it.

What's Actually in the Release

OpenAI structured these policies as prompts rather than hard-coded rules, which means developers can adapt them to specific use cases and iterate over time. The company worked with Common Sense Media and everyone.ai to define edge cases and refine the policy language.

Dr. Mathilde Cerioli, Chief Scientist at everyone.ai, noted that content filtering is just the starting point. Her team has already built on this work to create behavioral policies addressing risks like "exclusivity and overreliance"—the tendency of AI systems to become too central to a teen's social or emotional life.

The policies are being released through the ROOST Model Community on GitHub, explicitly inviting the developer community to translate them into other languages and extend coverage to additional risk areas.

The Limitations

OpenAI is clear these policies represent a floor, not a ceiling. The company explicitly states they don't reflect the full extent of its internal safeguards and shouldn't be treated as comprehensive teen safety solutions.

"Each application has unique risks, audiences and contexts," the release notes. Developers still need to layer these policies with product design decisions, user controls, monitoring systems, and what OpenAI calls "teen-friendly transparency."

This release builds on OpenAI's broader push for youth protection, including the Model Spec's Under-18 principles, parental controls in ChatGPT, and the Teen Safety Blueprint the company has been promoting as an industry standard. Whether competitors adopt similar open-source approaches will determine if this becomes a genuine ecosystem improvement or just an OpenAI talking point.

Image source: Shutterstock