Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new training framework developed by researchers at Tencent AI Lab and Washington University in St. Louis enables large language models (LLMs) to improve themselves without requiring any human-labeled data. The technique, called R-Zero, uses reinforcement learning to generate its own training data from scratch, addressing one of the main b