A co-worker of mine was looking at some disassembled ARM code the other day, and discovered something weird. Lots of d4d4 instructions, scattered about. LLVM’s objdump says this is a relative branch to -0x58 . The weird part is that they were always unreachable.
Experiments¶ Here’s an example in a minimal reproducer I wrote: 00020100 < one >: 20100: 4770 bx lr 20102: d4d4 bmi 0x200ae <__dso_handle+0x100ae> @ imm = #-0x58 That bx lr right before the d4d4 branches to the link register. In other words, it returns. Here’s the C code that goes with this function: #include "mod.h" static void one ( void ) { return ; } int main ( void ) { void * fn ; fn = one ; use_ptr ( fn ); return 0 ; } The use_ptr function is declared in mod.h (defined in mod.c ), and what it does with the pointer is not important. You can see that there’s a function called one , and that function just returns. Thus bx lr being the only thing. But why is there an extra d4d4 after it in the disassembled object code? My first thought was that it was there for alignment. Of course, Thumb instructions are 16 bits and maybe functions need to be 32-bit aligned. Weird that it would use a branch to a real relative address instead of a nop or something that would cause a fault, but let’s try expanding the experiment. code: static void one ( void ) { return ; } static void two ( void ) { return ; } int main ( void ) { void * fn ; fn = one ; use_ptr ( fn ); fn = two ; use_ptr ( fn ); return 0 ; } And the disassembly: 000200f4 < main >: 200f4: b580 push {r7, lr} 200f6: 466f mov r7, sp 200f8: 4803 ldr r0, [pc, #0xc] @ 0x20108
Conclusion¶ So now we know. LLD is inserting the weird d4d4 instructions, and it’s doing it to align across object file boundaries. Why did they pick such a weird constant, though? GNU ld went with zeroes, which seems benign.
Research¶ A little bit of checking out the code later, and we find this in ARM.cpp: trapInstr = { 0xd4 , 0xd4 , 0xd4 , 0xd4 }; That was actually way easier than I was expecting. The git blame path meanders a little before getting to this commit where Rui Ueyama explains: Add trap instructions for ARM and MIPS. This patch fills holes in executable sections with 0xd4 (ARM) or 0xef (MIPS). These trap instructions were suggested by Theo de Raadt. llvm-svn: 306322 This appears to have been precipitated by this message, also from Rui Ueyama, to the llvm-bugs mailing list. In it, the question is asked whether LLD should use a trap instruction for ARM/AArch64 like x86 and x86-64’s 0xCC . I didn’t find any replies, though. I couldn’t find any messages on the mailing list about this from Theo de Raadt, so I guess we just have to live with Ueyama’s testimony that he thought 0xd4 would be a good byte to repeat as a trap instruction. But a trap instruction is supposed to halt the processor, so what’s with the disassembler saying it’s a branch?
RTFM¶ Let’s take a look at the ARMv7-M Architecture Reference Manual, the ARM. First of all, it says that we’re using the Thumb instruction set, and most instructions are 16 bits. Any instructions that begin with 0b11101 , 0b11110 , or 0b11111 are the beginnings of 32-bit instructions, but all the rest are 16 bits. Since d4 is 0b11010100 , we can safely assume that the instruction decoder will always treat it like a 16-bit instruction. Next up, we have table A5-1, showing how 16-bit instructions are encoded. The first 6 bits are the opcode, followed by 10 bits of other stuff. As we established earlier, the bits we’re looking for are 0b110101 . That matches conditional branch and supervisor call’s 0b1101xx . Well, so far it’s still looking like a branch.. Page A5-134 leads us to the statement that the encoding here is 0b1101 followed by a 4 bit opcode. 0xd is the 0b1101 and the next 4 bits are the number 4: 0b0100 . This doesn’t match UDF , which seems like a reasonable choice for this purpose. Instead, any opcode not matching 111x is a conditional branch, explained under B on page A7-207, according to table A5-8. This page tells us that the B instruction has several encodings, but only one that begins with a 0xd : T1. With 0xd4d4 , cond in this table is 0x4 or 0b0100 . That doesn’t match UDF (again) or SVC . InITBlock() just checks if we’re within 4 instructions of an IT . IT is a weird instruction, but not important for our purposes. What’s important is that imm32 is now those least significant 8 bits, sign extended. That’s 0xffffffd4 , or -44 . That’s -0x2c in hex, not the -0x58 from objdump. However, immediate offsets count half-words, not bytes. So the offset is -88 , or -0x58 . The other field, cond , is 0b0100 , which means some bits in the condition registers have to be set in order for the branch to be taken. Which bits don’t particularly matter, since this code is supposed to be unreachable, but for completeness, it’s (ASPR.C == '1') && (ASPR.Z == '0') .