Tech News
← Back to articles

Experts Have World Models. LLMs Have Word Models

read original related products more articles

Tickets for AIE Miami and AIE Europe are on sale now! We’ll all be there.

Swyx here: we put a call out for Staff Researchers and Guest Writers and Ankit’s submission immediately stood out. As we discussed on the Yi Tay 2 episode, there are 3 kinds of World Models conversations today: The first and most common are 3D video world models like Fei Fei Li’s Marble and General Intuition’s upcoming model, Google’s Genie 3 and Waymo’s World Model, 2) the Meta school of thought comprising JEPA, V-JEPA, EchoJEPA and Code World Models pursuing Platonic representation by learning projections on a common latent space.

This essay covers the third kind that is now an obvious reasoning frontier: AI capable of multiagent world models that can accurately track theory of mind, anticipate reactions, and reveal/mine for information, particularly in adversarial situations. In benchmarks, both DeepMind and ARC-AGI and Code Clash are modeling these as games, but solving adversarial reasoning is very much serious business and calls out why the age of scaling is flipping back to the age of research. Enjoy!

Ask a trial lawyer if AI could replace her and she won’t even look up from her brief. No. Ask a startup founder who’s never practiced law and he’ll tell you it’s already happening. They’re both looking at the same output.

And honestly, the founder has a point. The brief reads like a brief. The contract looks like what a contract would look like. The code runs. If you put it next to the expert’s work, most people would struggle to tell the difference.

So what is the expert seeing that everyone else isn’t? Vulnerabilities. They know exactly how an adversary will exploit the document the moment it lands on their desk.

People try to explain this disconnect away. Sometimes they blame bad prompting, sometimes they assume models being more intelligent would be able to do the job. I would wager that intelligence is the wrong axis to look at. It’s about simulation depth.

Let’s take an illustrative example about approaching people:

A simple Slack Message

You’re three weeks into a new job. You need the lead designer to review your mockups, but she’s notoriously overloaded. You ask ChatGPT to draft a Slack message.

... continue reading