Published December 22, 2025 Email [email protected] Physical Intelligence
Loading…
When a computer defeated the world champion at chess in 1996, it could select the best moves but needed a person to move the pieces. Twenty years later, when AlphaGo defeated the world champion in Go, it still could not move the pieces on its own. Today, LLMs can solve gold medal IMO problems, but can't write down the answer with a pencil. This mismatch between our expectations about how hard something is for us and how hard it is for machines is called Moravec's paradox. Seemingly hard problems like playing chess, solving math problems, or planning routes through congested streets to minimize travel time are "easy" for machines, whereas seemingly easy problems like picking up a chess piece, writing a note, making a peanut butter sandwich, or washing the dishes present exceptionally difficult challenges. Underlining this paradox, Benjie Holson proposed a set of "Robot Olympics" challenge tasks in a recent blog post, with seemingly simple everyday behaviors like spreading peanut butter, washing a greasy pan, putting a key in a lock, and turning socks inside-out. These challenge tasks might not seem as cognitively demanding as math olympiad problems, but robotics experts believe they present exceptional challenges for autonomous robots. We wanted to see how many of these tasks we could tackle just by fine-tuning our latest model, based on π 0.6 . This is a good test of generalist capability: the tasks were not selected by us, they test a variety of manipulation capabilities, and they have not been demonstrated with previous robotic systems. We've been able to demonstrate initial solutions for "gold medal" tasks in 3 out of 5 proposed categories, with "silver medal" for the other 2. The two gold medal tasks that we did not solve were physically impossible for our robot, though one of them could be solved with a small modification (using a metal tool). We did all this simply by fine-tuning our latest model. This was not a focused research project, and most of the work consisted of collecting data for each task (under 9 hours for most tasks). The Olympics Benjie Holson's original proposed tasks are separated into categories, with "bronze," "silver," and "gold" tasks within each category. We did not do everything possible for the highest success rate (as discussed, e.g., in our recent work on using RL for optimizing reliability and speed), and the policies for these tasks are often not consistent, though on average they have a success rate of 52% and a task progress of 72%. We also ran a baseline that fine-tuned a standard VLM, without using our π 0.6 model, to test the importance of robotic foundation model pre-training. This baseline did not succeed on any of the tasks, and had an average task progress of of 9%, indicating that large-scale robot pre-training is essential for this result. Whenever possible, we tried to set up the tasks to match the original blog post. For some of the tasks we used a fixed (non-mobile) robot, though the original tasks are intended for mobile robots, but we don't expect that a mobile base would make these static tasks any harder.
🥇 Event 1: full body (a.k.a. door). The gold-medal task in this category is to open and go through a self-closing lever-handle door. This is hard because the robot has to keep the door open as it goes through it.
Loading…
🥈 Event 2: laundry. The gold-medal task is to hang an inside-out dress shirt, after turning it right-side-in, which we do not believe our current robot can do physically, because the gripper is too wide to fit inside the sleeve (something we should fix in the next hardware revision!). We therefore tackled the silver-medal task, which is to turn a sock inside-out. This task is quite difficult due to the shape of the robot's gripper, but our policy was able to learn it with about 8 hours of data.
Loading…
We also trained a policy for the bronze medal task, folding an inside-out t-shirt.
Loading…
🥇 Event 3: basic tool use. We tested all three (bronze, silver, gold) tasks in this category. The gold-medal task is to use a key. This is hard because of fine manipulation and the requirement to reorient the key with the grippers without putting it down. While the original task shows a person handing the key to the robot, we had the robot pick it up off the table.
... continue reading