Load-time relocation of shared libraries (2011)

This article's aim is to explain how a modern operating system makes it possible to use shared libraries with load-time relocation. It focuses on the Linux OS running on 32-bit x86, but the general principles apply to other OSes and CPUs as well.

Note that shared libraries have many names - shared libraries, shared objects, dynamic shared objects (DSOs), dynamically linked libraries (DLLs - if you're coming from a Windows background). For the sake of consistency, I will try to just use the name "shared library" throughout this article.

Loading executables Linux, similarly to other OSes with virtual memory support, loads executables to a fixed memory address. If we examine the ELF header of some random executable, we'll see an Entry point address: $ readelf -h /usr/bin/uptime ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 [...] some header fields Entry point address: 0x8048470 [...] some header fields This is placed by the linker to tell the OS where to start executing the executable's code . And indeed if we then load the executable with GDB and examine the address 0x8048470 , we'll see the first instructions of the executable's .text segment there. What this means is that the linker, when linking the executable, can fully resolve all internal symbol references (to functions and data) to fixed and final locations. The linker does some relocations of its own , but eventually the output it produces contains no additional relocations. Or does it? Note that I emphasized the word internal in the previous paragraph. As long as the executable needs no shared libraries , it needs no relocations. But if it does use shared libraries (as do the vast majority of Linux applications), symbols taken from these shared libraries need to be relocated, because of how shared libraries are loaded.

Loading shared libraries Unlike executables, when shared libraries are being built, the linker can't assume a known load address for their code. The reason for this is simple. Each program can use any number of shared libraries, and there's simply no way to know in advance where any given shared library will be loaded in the process's virtual memory. Many solutions were invented for this problem over the years, but in this article I will just focus on the ones currently used by Linux. But first, let's briefly examine the problem. Here's some sample C code which I compile into a shared library: int myglob = 42 ; int ml_func ( int a, int b) { myglob += a; return b + myglob; } Note how ml_func references myglob a few times. When translated to x86 assembly, this will involve a mov instruction to pull the value of myglob from its location in memory into a register. mov requires an absolute address - so how does the linker know which address to place in it? The answer is - it doesn't. As I mentioned above, shared libraries have no pre-defined load address - it will be decided at runtime. In Linux, the dynamic loader is a piece of code responsible for preparing programs for running. One of its tasks is to load shared libraries from disk into memory, when the running executable requests them. When a shared library is loaded into memory, it is then adjusted for its newly determined load location. It is the job of the dynamic loader to solve the problem presented in the previous paragraph. There are two main approaches to solve this problem in Linux ELF shared libraries: Load-time relocation Position independent code (PIC) Although PIC is the more common and nowadays-recommended solution, in this article I will focus on load-time relocation. Eventually I plan to cover both approaches and write a separate article on PIC, and I think starting with load-time relocation will make PIC easier to explain later. (Update 03.11.2011: the article about PIC was published)

Linking the shared library for load-time relocation To create a shared library that has to be relocated at load-time, I'll compile it without the -fPIC flag (which would otherwise trigger PIC generation): gcc -g -c ml_main.c -o ml_mainreloc.o gcc -shared -o libmlreloc.so ml_mainreloc.o The first interesting thing to see is the entry point of libmlreloc.so : $ readelf -h libmlreloc.so ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 [...] some header fields Entry point address: 0x3b0 [...] some header fields For simplicity, the linker just links the shared object for address 0x0 (the .text section starting at 0x3b0 ), knowing that the loader will move it anyway. Keep this fact in mind - it will be useful later in the article. Now let's look at the disassembly of the shared library, focusing on ml_func : $ objdump -d -Mintel libmlreloc.so libmlreloc.so: file format elf32-i386 [...] skipping stuff 0000046c : 46c: 55 push ebp 46d: 89 e5 mov ebp,esp 46f: a1 00 00 00 00 mov eax,ds:0x0 474: 03 45 08 add eax,DWORD PTR [ebp+0x8] 477: a3 00 00 00 00 mov ds:0x0,eax 47c: a1 00 00 00 00 mov eax,ds:0x0 481: 03 45 0c add eax,DWORD PTR [ebp+0xc] 484: 5d pop ebp 485: c3 ret [...] skipping stuff After the first two instructions which are part of the prologue , we see the compiled version of myglob += a . The value of myglob is taken from memory into eax , incremented by a (which is at ebp+0x8 ) and then placed back into memory. But wait, the mov takes myglob ? Why? It appears that the actual operand of mov is just 0x0 . What gives? This is how relocations work. The linker places some provisional pre-defined value ( 0x0 in this case) into the instruction stream, and then creates a special relocation entry pointing to this place. Let's examine the relocation entries for this shared library: $ readelf -r libmlreloc.so Relocation section '.rel.dyn' at offset 0x2fc contains 7 entries: Offset Info Type Sym.Value Sym. Name 00002008 00000008 R_386_RELATIVE 00000470 00000401 R_386_32 0000200C myglob 00000478 00000401 R_386_32 0000200C myglob 0000047d 00000401 R_386_32 0000200C myglob [...] skipping stuff The rel.dyn section of ELF is reserved for dynamic (load-time) relocations, to be consumed by the dynamic loader. There are 3 relocation entries for myglob in the section showed above, since there are 3 references to myglob in the disassembly. Let's decipher the first one. It says: go to offset 0x470 in this object (shared library), and apply relocation of type R_386_32 to it for symbol myglob . If we consult the ELF spec we see that relocation type R_386_32 means: take the value at the offset specified in the entry, add the address of the symbol to it, and place it back into the offset. What do we have at offset 0x470 in the object? Recall this instruction from the disassembly of ml_func : 46f: a1 00 00 00 00 mov eax,ds:0x0 a1 encodes the mov instruction, so its operand starts at the next address which is 0x470 . This is the 0x0 we see in the disassembly. So back to the relocation entry, we now see it says: add the address of myglob to the operand of that mov instruction. In other words it tells the dynamic loader - once you perform actual address assignment, put the real address of myglob into 0x470 , thus replacing the operand of mov by the correct symbol value. Neat, huh? Note also the "Sym. value" column in the relocation section, which contains 0x200C for myglob . This is the offset of myglob in the virtual memory image of the shared library (which, recall, the linker assumes is just loaded at 0x0 ). This value can also be examined by looking at the symbol table of the library, for example with nm : $ nm libmlreloc.so [...] skipping stuff 0000200c D myglob This output also provides the offset of myglob inside the library. D means the symbol is in the initialized data section ( .data ).

Load-time relocation in action To see the load-time relocation in action, I will use our shared library from a simple driver executable. When running this executable, the OS will load the shared library and relocate it appropriately. Curiously, due to the address space layout randomization feature which is enabled in Linux, relocation is relatively difficult to follow, because every time I run the executable, the libmlreloc.so shared library gets placed in a different virtual memory address . This is a rather weak deterrent, however. There is a way to make sense in it all. But first, let's talk about the segments our shared library consists of: $ readelf --segments libmlreloc.so Elf file type is DYN (Shared object file) Entry point 0x3b0 There are 6 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00000000 0x00000000 0x004e8 0x004e8 R E 0x1000 LOAD 0x000f04 0x00001f04 0x00001f04 0x0010c 0x00114 RW 0x1000 DYNAMIC 0x000f18 0x00001f18 0x00001f18 0x000d0 0x000d0 RW 0x4 NOTE 0x0000f4 0x000000f4 0x000000f4 0x00024 0x00024 R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4 GNU_RELRO 0x000f04 0x00001f04 0x00001f04 0x000fc 0x000fc R 0x1 Section to Segment mapping: Segment Sections... 00 .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .eh_frame 01 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss 02 .dynamic 03 .note.gnu.build-id 04 05 .ctors .dtors .jcr .dynamic .got To follow the myglob symbol, we're interested in the second segment listed here. Note a couple of things: In the section to segment mapping in the bottom, segment 01 is said to contain the .data section, which is the home of myglob

section, which is the home of The VirtAddr column specifies that the second segment starts at 0x1f04 and has size 0x10c , meaning that it extends until 0x2010 and thus contains myglob which is at 0x200C . Now let's use a nice tool Linux gives us to examine the load-time linking process - the dl_iterate_phdr function, which allows an application to inquire at runtime which shared libraries it has loaded, and more importantly - take a peek at their program headers. So I'm going to write the following code into driver.c : #define _GNU_SOURCE #include #include #include static int header_handler ( struct dl_phdr_info* info, size_t size, void * data) { printf( "name=%s (%d segments) address=%p

" , info->dlpi_name, info->dlpi_phnum, ( void *)info->dlpi_addr); for ( int j = 0 ; j < info->dlpi_phnum; j++) { printf( "\t\t header %2d: address=%10p

" , j, ( void *) (info->dlpi_addr + info->dlpi_phdr[j].p_vaddr)); printf( "\t\t\t type=%u, flags=0x%X

" , info->dlpi_phdr[j].p_type, info->dlpi_phdr[j].p_flags); } printf( "

... continue reading