Real-time action chunking with large models
Published June 9, 2025 Email [email protected] Kevin Black, Manuel Y. Galliker, Sergey Levine Unlike chatbots or image generators, robots must operate in real time. While a robot is “thinking”, the world around it evolves according to physical laws, so delays between inputs and outputs have a tangible impact on performance. For a language model, the difference between fast and slow generation is a satisfied or annoyed user; for a vision-language-action model (VLA), it could