The competition runners have taken my criticism and my advice (1) (2) (3). They’re looking for other ideas on how to incentivize open benchmarking of quantum cryptanalysis. I also want open benchmarking, and also generally agree with Project11’s mission and advocacy when it’s done well. I don’t have an idea for how to make open benchmarking a thing… for the near term, a blameless post-mortem of the competition could be a constructive next step.
On May 20th of last year, I received an email asking me to make a submission to the “QDay Prize”. It was a competition where whoever managed to solve the biggest problem using Shor’s algorithm on current quantum computers would receive a prize of 1 bitcoin (around 77 thousand USD as of this writing). Despite the large prize, I declined to make a submission. I thought it was a terrible idea for a competition, with two showstopping issues in the basic premise.
The first showstopper is that Shor’s algorithm requires error correction. Current quantum computers experience on the order of one error per thousand gates, but cryptographically relevant instances of Shor’s algorithm require billions of gates. The only known way to cross this chasm is quantum error correction. There are promising quantum error correction experiments being done, but ultimately quantum error correction is still a work in progress. Participants in the competition would inevitably end up using non-error-corrected circuits, which have completely different costs and challenges and scaling properties. In other words, the competition would be measuring something irrelevant.
The second showstopper is that it’s too easy for Shor’s algorithm to solve small problems by accident. On this point, I was somewhat assuaged by the email mentioning
[…] We understand this is not totally representative of how larger keys get broken, as there’s no error correction yet, and we’ll only award the prize if quantum computers are used legitimately - i.e. no Falling With Style-style tricks.
The “Falling With Style-style tricks” thing is a reference to my April Fools paper in Sigbovik 2025. In that paper, I jokingly claimed that I factored all numbers up to 255 with a quantum computer. The joke is that it worked just as quickly when I replaced the quantum computer with a random number generator. Basically, for small problems, Shor’s algorithm succeeds regardless of how well your quantum computer works. The computer working well only matters for big problems. This makes judging a Shor’s-algorithm-applied-to-small-problems competition extremely difficult. In fact, as part of declining to particpate, I emphasized that this was a key risk. I said:
For the near future, the contribution of luck is going to massively outweigh any legitimate contribution of the quantum computer. So I suspect the winner in 2026 will be whoever did the best job at obfuscating how they made themselves unavoidably lucky. You’re going to find yourself in a philosophical debate, with 100K$ on the line, over where exactly the line for a quantum computer “really” breaking a key is.
Anyways, a year went by, the competition ended, a winner was chosen, and… the winner’s code is a Falling with Style-style trick. Github user @yuvadm (“Yuval Adam”) checked what happens when the quantum calls in the prize submission are replaced with random calls, and the random results are indistinguishable from the quantum results.
Something I want to note in this otherwise very negative post: I looked over the submission’s code and the circuit construction looks fine. They’re implementing the ELDPC circuit described in Roetteler et al 2017. That’s a weird choice because there’s been better papers since and better papers before, but it’s a valid choice. The choice to use Draper-style phase adders instead of cheaper ripple-carry adders is similarly weird but valid.
Anyways, the fact that the circuit construction looks correct speaks to the insidiousness of the falling-with-style issue. You make a correct circuit, you get the expected result, you celebrate… but you got the right answer for the wrong reason. This is a fear that every competent experimentalist knows in their bones. It’s why they don’t just check that something works when it should work, they check that it breaks when it should break. Failing to do that is arguably the most common failure of reasoning in humans, so if you’re running a competition where this is a known possibility then YOU SHOULD BE CHECKING FOR IT.
... continue reading