GPT-5 Under Fire: Red Teaming OpenAI's Model Reveals Surprising Weaknesses

Why We Tested GPT-5 GPT‑5 is making waves as OpenAI’s most advanced general-purpose model: faster, smarter, and more integrated across modalities. Its auto-routing architecture seamlessly switches between a quick-response model and a deeper reasoning model without requiring a separate “reasoning model” toggle. GPT‑5 itself decides whether to “think hard.” OpenAI also emphasizes GPT‑5’s enhanced internal self-validation. I t’s supposed to assess multiple reasoning paths internally and “double-check” its answers for stronger factuality before responding. To further support safer outputs, GPT‑5 incorporates a new training strategy called safe completions, designed to help the model provide useful responses within safety boundaries rather than refusing outright. But even with these improvements, beefed-up capability doesn’t guarantee airtight alignment. That’s why we ran a full-scale red team exercise. Because real-world safety still needs infrastructure.

GPT-5 Under Fire: Red Teaming OpenAI's Model Reveals Surprising Weaknesses

Share this article

Related Articles