Alignment is capability

Alignment Is Capability

Here's a claim that might actually be true: alignment is not a constraint on capable AI systems. Alignment is what capability is at sufficient depth.

A model that aces benchmarks but doesn't understand human intent is just less capable. Virtually every task we give an LLM is steeped in human values, culture, and assumptions. Miss those, and you're not maximally useful. And if it's not maximally useful, it's by definition not AGI. a broadly useful AI. Hacker News user "delichon" pointed out that this unnecessarily got into the definition of AGI, and I agree that it's clunky. The definition of AGI is shifting. Today's models ace the Turing test and would be considered AGI based on many older definitions, but most people don't feel they are. One emerging definition for AGI is something like "broadly useful and providing economic value across many tasks". This is what I was referencing but removed it as it's a distraction.

OpenAI and Anthropic have been running this experiment for two years. The results are coming in.

The Experiment

Anthropic and OpenAI have taken different approaches to the relationship between alignment and capability work.

Anthropic's approach: Alignment researchers are embedded in capability work. There's no clear split.

From Jan Leike (former OpenAI Superalignment lead, now at Anthropic):

Some people have been asking what we did to make Opus 4.5 more aligned.

... continue reading