Claude Mythos: The System Card

Claude Mythos is different.

This is the first model other than GPT-2 that is at first not being released for public use at all.

With GPT-2 the delay was due to a general precautionary principle. OpenAI did not know what they had, or what effect on demand text would have on various systems. It sounds funny now, GPT-2 was harmless, but at the time the concern was highly reasonable.

The decision not to release Claude Mythos is not about an amorphous fear. If given to anyone with a credit card, Claude Mythos would give attackers a cornucopia of zero-day exploits for essentially all the software on Earth, including every major operating system and browser. It would be chaos.

Or, in theory, if Anthropic had chosen to do so, it could have used those exploits. Great power was on offer, and that power was refused. This does not happen often.

Instead Anthropic has created Project Glasswing. Mythos is being given only to cybersecurity firms, so they can patch the world’s most important software. Based on how that goes, we can then decide if and when it will become reasonable to give access to a broader range of people.

Who counts as this ‘we’ is suddenly quite the interesting question. The government picked quite the month to decide to try and disentangle itself from all Anthropic products. Anthropic says it is attempting to work with the government, so that they too can fix their own systems before it is too late. Hopefully that can happen. I also hope that there isn’t an attempt by the government to hijack these capabilities to use them in an offensive capacity. That would be a very serious mistake.

Am I taking Anthropic’s word for all this? Yes, I am taking Anthropic’s word for all of this. They’ve given us sufficient public demonstrations, identifying numerous bugs, and they’ve gotten the cooperation of the world’s biggest tech and cybersecurity firms, and if it wasn’t real then the whole thing would quickly and obviously backfire. I think it is safe to assume that all of this is legitimate.

I will address the ‘is Anthropic lying?’ arguments in another post, along with Project Glasswing and all the Cyber capabilities and political implications.

Indeed, I’m going to skip over the Cyber section of the model card entirely, because it simply isn’t the right place to look into exactly what Mythos can do in that area. The model card evaluations can be approximated with ‘yes.’

... continue reading