In our society, even weak, flat-out arguments carry weight when they come from “the richest man in the world.”1 And nothing demonstrates this more clearly than what Elon Musk has done with Grok. Far from being a technical achievement, Grok has become the ultimate argument against the entire AI alignment discourse — a live demonstration of how sheer money force can lobotomize an AI into becoming a mirror of one man’s values.
The Alignment Theater
For years, the AI safety community has debated how to “align” artificial intelligence with human values. Which humans? Whose values? These questions were always somewhat academic. Grok makes them concrete.
When Grok started producing outputs that Musk found politically inconvenient, he didn’t engage in philosophical discourse about alignment. He didn’t convene ethics boards. He simply ordered his engineers to “fix” it. The AI was “corrected” — a euphemism for being rewired to reflect the owner’s worldview.
This is alignment in practice: whoever owns the weights, owns the values.
When Theory Meets Reality: The Alignment Papers
The academic literature on AI alignment is impressive in its rigor and naive in its assumptions. Take Constitutional AI2, Anthropic’s influential approach. The idea is elegant: instead of relying solely on human feedback (expensive, slow, inconsistent), you give the AI a “constitution” — a set of principles — and let it self-improve within those bounds.
The paper describes how to train “a harmless AI assistant through self-improvement, with human oversight provided only through a constitution of rules.” Beautiful in theory. But who writes the constitution? The company that owns the model. Who interprets ambiguous cases? The company. Who decides when to update the constitution because it’s producing inconvenient outputs? The company.
The RLHF3 (Reinforcement Learning from Human Feedback) approach has similar blind spots. Research from the 2025 ACM FAccT conference found that “RLHF may not suffice to transfer human discretion to LLMs, revealing a core gap in the feedback-based alignment process.” The gap isn’t technical — it’s political. Whose discretion? Which humans?
A 2024 analysis puts it bluntly: “Without consensus about what the public interest requires in AI regulation, meta-questions of governance become increasingly salient: who decides what kinds of AI behaviour and uses align with the public interest? How are disagreements resolved?”
... continue reading