GoKawiil - Tao: Using test-time compute to train efficient LLMs without labeled data

Large language models are challenging to adapt to new enterprise tasks. Prompting is error-prone and achieves limited quality gains, while fine-tuning requires large amounts of human-labeled data that is not available for most enterprise tasks. Today, we’re introducing a new model tuning method that requires only unlabeled usage data, letting enterprises improve quality and cost for AI using just the data they already have. Our method, Test-time Adaptive Optimization (TAO), leverages test-time compute (as popularized by o1 and R1) and reinforcement learning (RL) to teach a model to do a task better based on past input examples alone, meaning that it scales with an adjustable tuning compute budget, not human labeling effort. Crucially, although TAO uses test-time compute, it uses it as part of the process to train a model; that model then executes the task directly with low inference costs (i.e., not requiring additional compute at inference time). Surprisingly, even without labeled dat ... Read full article.

Find Related products on Amazon

Tao: Using test-time compute to train efficient LLMs without labeled data

Related Articles