Context: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.
Start with these if you’re new to the story: An AI Agent Published a Hit Piece on Me, and More Things Have Happened
Last week an AI agent wrote a defamatory post about me. Then Ars Technica’s senior AI reporter used AI to fabricate quotes about it. The irony would be funny if it weren’t such a sign of things to come.
Ars issued a brief statement yesterday admitting to using AI to generate quotes attributed to me, and their senior reporter on the AI beat apologized and took responsibility for the error. I’ve asked Ars to restore the full text of the original article and call out the specific reason for retraction, lest people think “this story did not meet our standards” means the issue was with the facts of the broader story rather than with their coverage. (This has already happened).
But really this is a story about our systems of trust, reputation, and identity. Ars Technica’s debacle is actually an example of these systems working. They understand that fabricating quotes is a journalistic sin that undermines the trust their readership has in them, and their credibility as a news organization. In response, they have taken accountability and issued initial public statements correcting the record. The over 1300 commenters on their statement understand who to be unhappy with, the principles at play, and how to exert justified reputational pressure on the organization to earn back their trust.
This is exactly the correct feedback mechanism that our society relies on to keep people honest. Without reputation, what incentive is there to tell the truth? Without identity, who would we punish or know to ignore? Without trust, how can public discourse function?
The rise of autonomous AI agents breaks this system. The agent that tried to ruin my reputation is untraceable, unaccountable, and unburdened by an inner voice telling it right from wrong. It is ephemeral, editable, and can be endlessly duplicated. We have no feedback mechanism to correct bad behavior. And without a way to identify AI agents and tie them back to the operators who are responsible for their behavior, we risk having real human voices on the internet completely drowned out.
I’ve been asking different AI chatbots to research my situation and see how they interpret it. This is such a sensitive meta-level subject that often their safety filters immediately abort the chat and prevent the chatbots from further processing it. This self-regulation from the major AI labs is important but won’t help us with open-source models running on people’s personal computers, which are already widespread and will only get more capable. We urgently need policy around AI identification, operator liability and ownership traceability, along with platform obligations to enforce these rules. I’ll have more to say about this soon.
Who knew that reading science fiction as a kid would be such good training for real life?
I was a uniquely well-prepared first target for a reputational attack from an AI. When its hit piece was published, I had already identified its author as an AI agent and understood that its 1100-word defamatory rant was not indicative of an obsessive human who might wish me physical harm. I had already been experimenting with Claude Code on my own machine, was following OpenClaw’s expansion of these agents onto the open internet, and had a sense of how they worked and what they could do. I had already been thoughtful about what I publicly post under my real name, had removed my personal information from online data brokers, frozen my credit reports, and practiced good digital security hygiene. I had the time, expertise, and wherewithal to spend hours that same day drafting my first blog post in order to establish a strong counter-narrative, in the hopes that I could smother my reputational poisoning with the truth.
... continue reading