D4d4 - GoKawiil

A co-worker of mine was looking at some disassembled ARM code the other day, and discovered something weird. Lots of d4d4 instructions, scattered about. LLVM’s objdump says this is a relative branch to -0x58 . The weird part is that they were always unreachable.

Experiments¶ Here’s an example in a minimal reproducer I wrote: 00020100 < one >: 20100: 4770 bx lr 20102: d4d4 bmi 0x200ae <__dso_handle+0x100ae> @ imm = #-0x58 That bx lr right before the d4d4 branches to the link register. In other words, it returns. Here’s the C code that goes with this function: #include "mod.h" static void one ( void ) { return ; } int main ( void ) { void * fn ; fn = one ; use_ptr ( fn ); return 0 ; } The use_ptr function is declared in mod.h (defined in mod.c ), and what it does with the pointer is not important. You can see that there’s a function called one , and that function just returns. Thus bx lr being the only thing. But why is there an extra d4d4 after it in the disassembled object code? My first thought was that it was there for alignment. Of course, Thumb instructions are 16 bits and maybe functions need to be 32-bit aligned. Weird that it would use a branch to a real relative address instead of a nop or something that would cause a fault, but let’s try expanding the experiment. code: static void one ( void ) { return ; } static void two ( void ) { return ; } int main ( void ) { void * fn ; fn = one ; use_ptr ( fn ); fn = two ; use_ptr ( fn ); return 0 ; } And the disassembly: 000200f4 < main >: 200f4: b580 push {r7, lr} 200f6: 466f mov r7, sp 200f8: 4803 ldr r0, [pc, #0xc] @ 0x20108 200fa: f000 f80b bl 0x20114 @ imm = #0x16 200fe: 4803 ldr r0, [pc, #0xc] @ 0x2010c 20100: f000 f808 bl 0x20114 @ imm = #0x10 20104: 2000 movs r0, #0x0 20106: bd80 pop {r7, pc} 20108: 11 01 02 00 .word 0x00020111 2010c: 13 01 02 00 .word 0x00020113 00020110 < one >: 20110: 4770 bx lr 00020112 < two >: 20112: 4770 bx lr 00020114 < use_ptr >: 20114: 4770 bx lr Not only does the compiler not feel the need to align functions to 32-bit boundaries, but adding a second single-instruction function actually got rid of the d4d4 entirely. And note how use_ptr (I made it just return also) doesn’t have a d4d4 in it. Curious. What happens if I go to 3 functions? code: 00020124 < one >: 20124: 4770 bx lr 00020126 < two >: 20126: 4770 bx lr 00020128 < three >: 20128: 4770 bx lr 2012a: d4d4 bmi 0x200d6 <__dso_handle+0x100d6> @ imm = #-0x58 0002012c < use_ptr >: 2012c: 4770 bx lr It’s back! But now only once. It seems like it aligns the end of main.o so that mod.o can start on a 32-bit boundary. Ok, maybe this makes sense. So let’s take a look at the compiler’s output directly. Here’s main.o : 00000028 < one >: 28: 4770 bx lr 0000002a < two >: 2a: 4770 bx lr 0000002c < three >: 2c: 4770 bx lr Oh, the compiler didn’t put that in at all, it must have been the linker! So lld is rounding the end of an object file up to 32-bit alignment using this d4d4 instruction. If that’s true, then linking mod.o before main.o ought to move where the d4d4 is. 000200e4 < use_ptr >: 200e4: 4770 bx lr 200e6: d4d4 bmi 0x20092 <__dso_handle+0x10092> @ imm = #-0x58 000200e8 < main >: 200e8: b5d0 push {r4, r6, r7, lr} 200ea: af02 add r7, sp, #0x8 200ec: 4804 ldr r0, [pc, #0x10] @ 0x20100 200ee: 4c05 ldr r4, [pc, #0x14] @ 0x20104 200f0: 47a0 blx r4 200f2: 4805 ldr r0, [pc, #0x14] @ 0x20108 200f4: 47a0 blx r4 200f6: 4805 ldr r0, [pc, #0x14] @ 0x2010c 200f8: 47a0 blx r4 200fa: 2000 movs r0, #0x0 200fc: bdd0 pop {r4, r6, r7, pc} 200fe: bf00 nop 20100: 11 01 02 00 .word 0x00020111 20104: e5 00 02 00 .word 0x000200e5 20108: 13 01 02 00 .word 0x00020113 2010c: 15 01 02 00 .word 0x00020115 00020110 < one >: 20110: 4770 bx lr 00020112 < two >: 20112: 4770 bx lr 00020114 < three >: 20114: 4770 bx lr It did! The extra instruction got moved up to align the beginning of the object code that contains main! One more check: if I use the GNU linker, does it do the same thing? 00008000 < use_ptr >: 8000: 4770 bx lr 8002: 0000 movs r0, r0 No! GNU ld (2.44) inserts zeroes to align files. It’s not nop , though it may as well be. ARMv7-M actually has a nop instruction: bf00 or f3af8000 depending on encoding. You can see it in main, inserted by the compiler between the end of the function and its constants.

Conclusion¶ So now we know. LLD is inserting the weird d4d4 instructions, and it’s doing it to align across object file boundaries. Why did they pick such a weird constant, though? GNU ld went with zeroes, which seems benign.

Research¶ A little bit of checking out the code later, and we find this in ARM.cpp: trapInstr = { 0xd4 , 0xd4 , 0xd4 , 0xd4 }; That was actually way easier than I was expecting. The git blame path meanders a little before getting to this commit where Rui Ueyama explains: Add trap instructions for ARM and MIPS. This patch fills holes in executable sections with 0xd4 (ARM) or 0xef (MIPS). These trap instructions were suggested by Theo de Raadt. llvm-svn: 306322 This appears to have been precipitated by this message, also from Rui Ueyama, to the llvm-bugs mailing list. In it, the question is asked whether LLD should use a trap instruction for ARM/AArch64 like x86 and x86-64’s 0xCC . I didn’t find any replies, though. I couldn’t find any messages on the mailing list about this from Theo de Raadt, so I guess we just have to live with Ueyama’s testimony that he thought 0xd4 would be a good byte to repeat as a trap instruction. But a trap instruction is supposed to halt the processor, so what’s with the disassembler saying it’s a branch?

RTFM¶ Let’s take a look at the ARMv7-M Architecture Reference Manual, the ARM. First of all, it says that we’re using the Thumb instruction set, and most instructions are 16 bits. Any instructions that begin with 0b11101 , 0b11110 , or 0b11111 are the beginnings of 32-bit instructions, but all the rest are 16 bits. Since d4 is 0b11010100 , we can safely assume that the instruction decoder will always treat it like a 16-bit instruction. Next up, we have table A5-1, showing how 16-bit instructions are encoded. The first 6 bits are the opcode, followed by 10 bits of other stuff. As we established earlier, the bits we’re looking for are 0b110101 . That matches conditional branch and supervisor call’s 0b1101xx . Well, so far it’s still looking like a branch.. Page A5-134 leads us to the statement that the encoding here is 0b1101 followed by a 4 bit opcode. 0xd is the 0b1101 and the next 4 bits are the number 4: 0b0100 . This doesn’t match UDF , which seems like a reasonable choice for this purpose. Instead, any opcode not matching 111x is a conditional branch, explained under B on page A7-207, according to table A5-8. This page tells us that the B instruction has several encodings, but only one that begins with a 0xd : T1. With 0xd4d4 , cond in this table is 0x4 or 0b0100 . That doesn’t match UDF (again) or SVC . InITBlock() just checks if we’re within 4 instructions of an IT . IT is a weird instruction, but not important for our purposes. What’s important is that imm32 is now those least significant 8 bits, sign extended. That’s 0xffffffd4 , or -44 . That’s -0x2c in hex, not the -0x58 from objdump. However, immediate offsets count half-words, not bytes. So the offset is -88 , or -0x58 . The other field, cond , is 0b0100 , which means some bits in the condition registers have to be set in order for the branch to be taken. Which bits don’t particularly matter, since this code is supposed to be unreachable, but for completeness, it’s (ASPR.C == '1') && (ASPR.Z == '0') .