We accidentally solved robotics by watching 1M hours of YouTube

how we accidentally solved robotics by watching 1 million hours of YouTube

29 Jun, 2025

the existential crisis we all share

imagine this: you've just spent $640 billion training the chonkiest language model known to humanity (lol) and decide to call it "Behemoth". it can annoy you on whatsapp, try to solve calculus, and argue with you about anything with a sophistication of a philosophy PhD.

but ask it to grab a coffee mug from your kitchen counter? ngmi

turns out scaling LLMs forever still leaves robots as clueless. internet-scale language misses the fundamental physics of stuff actually moving around in 3D space. and no amount of "think step by step" or COT prompting helps to teach your chatterbox where the trash is in the kitchen

but if i told you that the solution was hiding in plain sight? what if the secret sauce wasn't more tokens, but more... videos?

the "why didn't we think of this sooner" moment

here's the thing everyone forgot while we were busy making ai agents book flight tickets: robots need to understand physics, not language.

so enter V-JEPA 2, which basically said "hey, what if we fed a neural network 1 million hours of youtube and taught it to predict what happens next?" except instead of predicting the next word, it predicts the next moment in reality.

... continue reading