You can now train OpenAI with RL and GRPO via Unsloth. Unsloth now offers the fastest inference (3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss. Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to deliver 3x faster inference for gpt-oss at ~21 tokens/s. For BF16, Unsloth also achieves the fastest inference (~30 tokens/s), especially relative to VRAM usage, using 50% less VRAM vs. any implementation.
Free notebook: This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward function. We also show how to which is one of RL's biggest challenges.
With Unsloth, you can train gpt-oss-20b with GRPO on 15GB VRAM and free on Colab. Unsloth's new inference runs faster on any GPU including A100, H100 and old T4's. gpt-oss-120b fits on 80GB VRAM.
Unsloth is the only framework to support 4-bit RL for gpt-oss. All performance gains are due to Unsloth's unique , , and custom kernels.
⚡Making Inference Much Faster
Inference is crucial in RL training. To achieve the fastest inference speed for gpt-oss without vLLM, we rewrote Transformers inference and integrated many innovations including custom algorithms like Unsloth , torch.compile. The new inference was evaluated against an already optimized baseline (2x faster than native Transformers).
vLLM does not support RL for gpt-oss since it lacks bf16 training and LoRA support for gpt-oss. Without Unsloth, only training via bf16 works, making memory use even 800%+ higher. Most frameworks enable FA3 by default (which reduces VRAM use & increases speed) but this causes incorrect training loss. You must disable FA3, though that prevents long-context training, so instead, we implemented .
We evaluated gpt-oss RL inference by benchmarking BitsandBytes 4-bit and also did separate tests for BF16. Unsloth’s 4-bit inference is ~4x faster, and BF16 is also more efficient, especially in VRAM use.
The best part about Unsloth's gpt-oss RL is that it can work on any GPU, even those that do not support bf16. Our free gpt-oss-20b Colab notebooks use older 15GB T4 GPUs, so the inference examples work well!
🛠️ gpt-oss Flex Attention Issues and Quirks
... continue reading