Physical Intelligence, a hot robotics startup, says its new robot brain can figure out tasks it was never taught

Physical Intelligence, the two-year-old, San Francisco-based robotics startup that has quietly become one of the most closely watched AI companies in the Bay Area, published new research Thursday showing that its latest model can direct robots to perform tasks they were never explicitly trained on — a capability the company’s own researchers say caught them off guard.

The new model, called π0.7, represents what the company describes as an early but meaningful step toward the long-sought goal of a general-purpose robot brain: One that can be pointed at an unfamiliar task, coached through it in plain language, and actually pull it off. If the findings hold up to scrutiny, they suggest that robotic AI may be approaching an inflection point similar to what the field saw with large language models — where capabilities begin compounding in ways that outpace what the underlying data would seem to predict.

But first: The core claim in the paper is compositional generalization — the ability to combine skills learned in different contexts to solve problems the model has never encountered. Until now, the standard approach to robot training has been essentially rote memorization — collect data on a specific task, train a specialist model on that data, then repeat for every new task. π0.7, Physical Intelligence says, breaks that pattern.

“Once it crosses that threshold where it goes from only doing exactly the stuff that you collect the data for to actually remixing things in new ways,” says Sergey Levine, a co-founder of Physical Intelligence and a UC Berkeley professor focused on AI for robotics, “the capabilities are going up more than linearly with the amount of data. That much more favorable scaling property is something we’ve seen in other domains, like language and vision.”

The paper’s most striking demonstration involves an air fryer the model had essentially never seen in training. When the research team investigated, they found only two relevant episodes in the entire training dataset: One where a different robot merely pushed the air fryer closed, and one from an open-source dataset where yet another robot placed a plastic bottle inside one on someone’s instructions. The model had somehow synthesized those fragments, plus broader web-based pretraining data, into a functional understanding of how the appliance works.

“It’s very hard to track down where the knowledge is coming from, or where it will succeed or fail,” says Ashwin Balakrishna, a research scientist at Physical Intelligence and a Stanford computer science PhD student. Still, with zero coaching, the model made a passable attempt at using the appliance to cook a sweet potato. With step-by-step verbal instructions — essentially, a human walking the robot through the task the way you might explain something to a new employee — it performed successfully.

That coaching capability matters because it suggests robots could be deployed in new environments and improved in real time without additional data collection or model retraining.

So what does it all mean? The researchers aren’t shy about the model’s limitations and are careful not to get ahead of themselves. In at least one case, they point the finger squarely at their own team.

“Sometimes the failure mode is not on the robot or on the model,” Balakrishna says. “It’s on us. Not being good at prompt engineering.” He describes an early air fryer experiment that produced a 5% success rate. After spending about half an hour refining how the task was explained to the model, it jumped to 95%, he says.

Image Credits:Physical Intelligence

... continue reading