XOR'ing a register with itself is the idiom for zeroing it out. Why not sub?

Matt Godbolt, probably best known for being the proprietor of Compiler Explorer, wrote a brief article on why x86 compilers love the xor eax, eax instruction.

The answer is that it is the most compact way to set a register to zero on x86. In particular, it is several bytes shorter than the more obvious mov eax, 0 since it avoids having to encode the four-byte constant. The x86 architecture does not have a dedicated zero register, so if you need to zero out a register, you’ll have to do it ab initio.

But Matt doesn’t explain why everyone chooses xor as opposed to some other mathematical operation that is guaranteed to result in a zero? In particular, what’s wrong with sub eax, eax ? It encodes to the same number of bytes, executes in the same number of cycles. And its behavior with respect to flags is even better:

xor eax, eax sub eax, eax OF clear clear SF clear clear ZF set set AF undefined clear PF set set CF clear clear

Observe that xor eax, eax leaves the AF flag undefined, whereas sub eax, eax clears it.

I don’t know why xor won the battle, but I suspect it was just a case of swarming.

In my hypothetical history, xor and sub started out with roughly similar popularity, but xor took a slightly lead due to some fluke, perhaps because it felt more “clever”.

When early compilers used xor to zero out a register, this started the snowball, because people would see the compiler generate xor and think, “Well, those compiler writes are smart, they must know something I don’t. Since I was on the fence between xor and sub , this tiny data point is enough to tip it toward xor .”

The predominance of these idioms as a way to zero out a register led Intel to add special xor r, r -detection and sub r, r -detection in the instruction decoding front-end and rename the destination to an internal zero register, bypassing the execution of the instruction entirely. You can imagine that the instruction, in some sense, “takes zero cycles to execute”. The front-end detection also breaks dependency chains: Normally, the output of an xor or sub is dependent on its inputs, but in this special case of xor ‘ing or sub ‘ing a register with itself, we know that the output is zero, independent of input.

Even though Intel added support for both xor -detection and sub -detection, Stack Overflow worries that other CPU manufacturers may have special-cased xor but not sub , so that makes xor the winner in this ultimately meaningless battle.

... continue reading