Fei Fei Li: Spatial Intelligence is AI’s Next Frontier

In 1950, when computing was little more than automated arithmetic and simple logic, Alan Turing asked a question that still reverberates today: can machines think? It took remarkable imagination to see what he saw: that intelligence might someday be built rather than born. That insight later launched a relentless scientific quest called Artificial Intelligence (AI). Twenty-five years into my own career in AI, I still find myself inspired by Turing’s vision. But how close are we? The answer isn’t simple.

Today, leading AI technology such as large language models (LLMs) have begun to transform how we access and work with abstract knowledge. Yet they remain wordsmiths in the dark; eloquent but inexperienced, knowledgeable but ungrounded. Spatial intelligence will transform how we create and interact with real and virtual worlds—revolutionizing storytelling, creativity, robotics, scientific discovery, and beyond. This is AI’s next frontier.

The pursuit of visual and spatial intelligence has been the North Star guiding me since I entered the field. It’s why I spent years building ImageNet, the first large-scale visual learning and benchmarking dataset and one of three key elements enabling the birth of modern AI, along with neural network algorithms and modern compute like graphics processing units (GPUs). It’s why my academic lab at Stanford has spent the last decade combining computer vision with robotic learning. And it’s why my cofounders Justin Johnson, Christoph Lassner, Ben Mildenhall, and I created World Labs more than one year ago: to realize this possibility in full, for the first time.

In this essay, I’ll explain what spatial intelligence is, why it matters, and how we’re building the world models that will unlock it—with impact that will reshape creativity, embodied intelligence, and human progress.

Spatial Intelligence: The scaffolding of human cognition

AI has never been more exciting. Generative AI models such as LLMs have moved from research labs to everyday life, becoming tools of creativity, productivity, and communication for billions of people. They have demonstrated capabilities once thought impossible, producing coherent text, mountains of code, photorealistic images, and even short video clips with ease. It’s no longer a question of whether AI will change the world. By any reasonable definition, it already has.

Yet so much still lies beyond our reach. The vision of autonomous robots remains intriguing but speculative, far from the fixtures of daily life that futurists have long promised. The dream of massively accelerated research in fields like disease curation, new material discovery, and particle physics remains largely unfulfilled. And the promise of AI that truly understands and empowers human creators—whether students learning intricate concepts in molecular chemistry, architects visualizing spaces, filmmakers building worlds, or anyone seeking fully immersive virtual experiences—remains beyond reach.

To learn why these capabilities remain elusive, we need to examine how spatial intelligence evolved, and how it shapes our understanding of the world.

Vision has long been a cornerstone of human intelligence, but its power emerged from something even more fundamental. Long before animals could nest, care for their young, communicate with language, or build civilizations, the simple act of sensing quietly sparked an evolutionary journey toward intelligence.

... continue reading