Marble: A Multimodal World Model

November 12, 2025Marble, our frontier multimodal world model, is available to everyone starting today

Spatial intelligence is the next frontier in AI, demanding powerful world models to realize its full potential. World models should reconstruct, generate, and simulate 3D worlds; and allow both humans and agents to interact with them. Spatially intelligent world models will transform a wide variety of industries over the coming years.

Two months ago we shared a preview of Marble, our World Model that creates 3D worlds from image or text prompts. Since then, Marble has been available to an early set of beta users to create 3D worlds for themselves.

Today we are making Marble, a first-in-class generative multimodal world model, generally available for anyone to use. We have also drastically expanded Marble's capabilities, and are excited to highlight them here:

Multimodal Marble: Marble is now massively multimodal. Marble can create 3D worlds from text, images, video, or coarse 3D layouts; Marble also lets you interactively edit, expand, and combine worlds. Once generated, 3D worlds can be exported as Gaussian splats, meshes, or videos. These new capabilities let users create and edit worlds with fine-grained control; and makes those worlds more useful than ever before.

Marble Labs: We are launching Marble Labs, a creative hub where imagination meets experimentation. It is where artists, engineers, and designers push the boundaries of world models, showcasing bold ideas, real-world workflows, and new possibilities across gaming, VFX, design, robotics, and beyond. Marble Labs is also home to in-depth case studies, tutorials, and documentation that give anyone the tools to learn, build, and share their own 3D worlds.

The Marble World Model

Our human experience of the world is inherently multimodal: we use all of our senses to make sense of the world around us. We integrate sight, sound, touch, and language to build up a mental model of the outside world; these different representations work together, enriching and reinforcing each other to let us reason about the world and act within it.

World models should work similarly. They should be massively multimodal, able to lift whatever input signals are available into a full 3D world, and they should be able to iteratively update their understanding of the world as new information becomes available.

... continue reading