GoKawiil - Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL

INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement Learning We're excited to release INTELLECT-2, the first 32B parameter model trained via globally distributed reinforcement learning. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning language model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors. To enable a training run with this unique infrastructure, we built various components from scratch: we introduce PRIME-RL, our training framework purpose-built for distributed asynchronous reinforcement learning, based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers, and SHARDCAST, which efficiently broadcasts policy weights from training nodes to inference workers. Beyond infrastructure components, we propose modifications to the standard GRPO training recipe and data filtering techniques that were cru ... Read full article.

Find Related products on Amazon

Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL

Related Articles