Tech News
← Back to articles

GPT-OSS Reinforcement Learning

read original related products more articles

You can now train OpenAI with RL and GRPO via Unsloth. Unsloth now offers the fastest inference (3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss. Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to deliver 3x faster inference for gpt-oss at ~21 tokens/s. For BF16, Unsloth also achieves the fastest inference (~30 tokens/s), especially relative to VRAM usage, using 50% less VRAM vs. any implementation.

Free notebook: This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward function. We also show how to which is one of RL's biggest challenges.

With Unsloth, you can train gpt-oss-20b with GRPO on 15GB VRAM and free on Colab. Unsloth's new inference runs faster on any GPU including A100, H100 and old T4's. gpt-oss-120b fits on 80GB VRAM.

Unsloth is the only framework to support 4-bit RL for gpt-oss. All performance gains are due to Unsloth's unique , , and custom kernels.

⚡Making Inference Much Faster

Inference is crucial in RL training. To achieve the fastest inference speed for gpt-oss without vLLM, we rewrote Transformers inference and integrated many innovations including custom algorithms like Unsloth , torch.compile. The new inference was evaluated against an already optimized baseline (2x faster than native Transformers).

vLLM does not support RL for gpt-oss since it lacks bf16 training and LoRA support for gpt-oss. Without Unsloth, only training via bf16 works, making memory use even 800%+ higher. Most frameworks enable FA3 by default (which reduces VRAM use & increases speed) but this causes incorrect training loss. You must disable FA3, though that prevents long-context training, so instead, we implemented .

We evaluated gpt-oss RL inference by benchmarking BitsandBytes 4-bit and also did separate tests for BF16. Unsloth’s 4-bit inference is ~4x faster, and BF16 is also more efficient, especially in VRAM use.

The best part about Unsloth's gpt-oss RL is that it can work on any GPU, even those that do not support bf16. Our free gpt-oss-20b Colab notebooks use older 15GB T4 GPUs, so the inference examples work well!

🛠️ gpt-oss Flex Attention Issues and Quirks

... continue reading