Tech News
← Back to articles

Copy-and-Patch: A Copy-and-Patch Tutorial

read original related products more articles

A Copy-and-Patch Tutorial

Copy-and-patch Compilation is a fascinating way of constructing a baseline JIT[1]. It permits incredibly fast runtime compilation of code fragments in a very easy to maintain fashion, requires barely any actual understanding of assembly code, and produces native code of sufficient quality to be within the same range as traditional, hand-written baseline JITs. [1]: Baseline JIT, as in a JIT whose goal is primarily to generate code quickly and gain performance by removing interpretation overhead than generating well optimized code itself. Baseline JITs can be paired with optimizing JITs, like V8’s Liftoff baseline JIT for WASM allowing tiering up into V8’s Crankshaft optimizing JIT. Copy-and-patch works by writing stencils, minimal C functions that implement the desired individual operations such that they compile to concatenate native code fragments. At JIT compile time, one can copy the pre-compiled fragment for each operation back-to-back, patching them change embedded constants or addresses as needed.. As an adventure into understanding how copy-and-patch works, our goal will be to create the function int add_a_b ( int a , int b ) { return a + b } But specialized at runtime to compute 1 + 2 . We’ll be doing this by first breaking it down into some bytecode-sized operations: const_int_reg1: a = 1; const_int_reg2: b = 2; add_int1_int2: c = a + b; return_int1: return c; And to define our copy-and-patch JIT, we’ll take each of these and: Implement the operation in C with relocation holes to be later patched to form our stencil. Compile the stencil into native code. Copy-paste the native code back into a C file with functions to emit it to a buffer and patch any relocations. Then we can write our little JIT compilation engine to concatenate our stencils and execute the generated function. Let’s get started!

Stencils

Our first step is to define our stencils:

stencils.c #include #define STENCIL_FUNCTION __attribute__((preserve_none)) extern char cnp_value_hole [ 65536 ]; extern void cnp_func_hole ( void ) STENCIL_FUNCTION ; #define STENCIL_HOLE(type) \ (type)((uintptr_t)&cnp_value_hole) #define DECLARE_STENCIL_OUTPUT(...) \ typedef void(*stencil_output_fn)(__VA_ARGS__) STENCIL_FUNCTION; \ stencil_output_fn stencil_output = (stencil_output_fn)&cnp_func_hole; STENCIL_FUNCTION void load_int_reg1 () { int a = STENCIL_HOLE ( int ); DECLARE_STENCIL_OUTPUT ( int ); stencil_output ( a ); } STENCIL_FUNCTION void load_int_reg2 ( int a ) { int b = STENCIL_HOLE ( int ); DECLARE_STENCIL_OUTPUT ( int , int ); stencil_output ( a , b ); } STENCIL_FUNCTION void add_int1_int2 ( int a , int b ) { int c = a + b ; DECLARE_STENCIL_OUTPUT ( int ); stencil_output ( c ); } STENCIL_FUNCTION int return_int1 ( int a ) { return a ; }

We compile this with clang -O3 -mcmodel=medium -c stencils.c , and examine the generated code via objdump -d -Mintel,x86-64 --disassemble --reloc stencils.o . This yields:

0000000000000000 < load_int_reg1 > : 0 : 41 bc 00 00 00 00 mov r12d , 0x0 2 : R_X86_64_32 cnp_value_hole 6 : e9 00 00 00 00 jmp b < load_int_reg1 + 0xb > 7 : R_X86_64_PLT32 cnp_func_hole - 0x4 b: 0 f 1 f 44 00 00 nop DWORD PTR [ rax + rax * 1 + 0x0 ] 0000000000000010 < load_int_reg2 > : 10 : 41 bd 00 00 00 00 mov r13d , 0x0 12 : R_X86_64_32 cnp_value_hole 16 : e9 00 00 00 00 jmp 1b < load_int_reg2 + 0xb > 17 : R_X86_64_PLT32 cnp_func_hole - 0x4 1 b: 0 f 1 f 44 00 00 nop DWORD PTR [ rax + rax * 1 + 0x0 ] 0000000000000020 < add_int1_int2 > : 20 : 45 01 ec add r12d , r13d 23 : e9 00 00 00 00 jmp 28 < add_int1_int2 + 0x8 > 24 : R_X86_64_PLT32 cnp_func_hole - 0x4 28 : 0 f 1 f 84 00 00 00 00 nop DWORD PTR [ rax + rax * 1 + 0x0 ] 2 f: 00 0000000000000030 < return_int1 > : 30 : 44 89 e0 mov eax , r12d 33 : c3 ret

(The NOP’s aren’t actually a part of the function, they’re just padding added so that each function starts with 16 byte alignment.)

For each of these stencils, we fill in a template to form our stencil generation library to use during JITing.

uint8_t cnp_stencil_ < OP > _code [] = { // Copy the bytes from the top of the function until the jmp. }; uint8_t * cnp_copy_ < OP > ( uint8_t * stencil_start ) { const size_t stencil_size = sizeof ( cnp_stencil_ < OP > _code ); memcpy ( stencil_start , cnp_stencil_ < OP > _code , stencil_size ); return stencil_start + stencil_size ; } // If any relocations exist for the stencil, fill in the values. // If not, just skip writing this function. void cnp_patch_ < OP > ( uint8_t * stencil_start , /* ... */ ) { memcpy ( stencil_start + /*relocation_offset*/ , & value , /* relocation_size */ ); }

... continue reading