The .a File Is a Relic: Why Static Archives Were a Bad Idea All Along

From the perspective of an SDK provider, we must not limit our customers. As such, we are expected to provide both the dynamic linking option, as well as the static linking one. And what will this mean? Dynamic linking — Provide Shared Object ( .so ) libraries, as well as matching compilation ( .pc ) definitions. ) libraries, as well as matching compilation ( ) definitions. Static linking — Provide Static Archive ( .a ) files, as well as matching compilation ( .pc ) definitions. When we bundle the installation of our SDK for some Linux distros, we might decide to adapt it to the expected delivery per distro. For instance, when one installs libpng on Red Hat Linux, the installation will not include a static archive ( .a ) file, although it will include it on an Ubuntu installation. Here’s a reminder of the gist of each of the loading methods. Dynamic Linking 101 The library is taken pretty much “as-is”, and mapped to the virtual address space of the program at load time. One notable advantage is that security fixes can be provided through updated library versions, as long as the ABI was kept intact (no breaking changes). Applying this update will simply require to reboot the program that uses the library, but will not require a full-blown recompilation. The file format used for Shared Objects ( .so ) is the same ELF file format as used for “regular” executables, with some slight differences at the file’s header. Static Linking 101 The library is “swallowed” pretty much “as-is” into the built executable file of the target ELF executable. One notable advantage is that we no longer need to bother ourselves with ensuring our dependencies are installed on the target machine. We simply bring them bundled inside our own program. The file format used for Static Archives ( .a ) is actually straightforward. This is literally an archive of raw object ( .o ) files. Below is an example from extracting DPDK’s librte_eal.a file into the 66 .o files that are stored within it, one of which is the eal_common_eal_common_dev.c.o file, that we will revisit later on. $ ar x librte_eal.a $ ls -lh eal_*.o | wc -l 66 $ file eal_common_eal_common_dev.c.o eal_common_eal_common_dev.c.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped Static Linking — Theory vs Reality While the simplistic overview looks good on paper, as always, the devil is in the details. Given that a static archive is just an archive of plain object files, how does the linker interpret it? Well, not the way you would have expected it to. A simplified way to explain the linker’s resolution process can be described as follows: Traverse the link flags, in order, and search for a file that could satisfy said symbol. If a shared object is found, check if this symbol is listed in the export table, and if so, mark this library as “needed” and “take it”. If a static archive is found, extract all .o files from the archive. Per .o file, check if the symbol is listed, and if so “take” only this .o file. When a module is “taken”, add all the symbols it imports to the list of symbols we should resolve, and continue on until finished. As can be seen above, the default behavior of the linker will be to break down our static archive back into a list of object files, later to pick-and-choose between them. While this has the advantage of possibly reducing the overall binary size (we only take what we need), it comes at a cost. Who decides which function goes into which compiled object file? Well, the default behavior is that every source ( .c ) file is being compiled into a single object file bearing the same name. As an example, let’s take a look on a program that encrypts logs before sending them to a log server. Most cryptographic libraries (OpenSSL is just one example), will have a .c file per cryptographic building block. This file will include all the operations (encryption and decryption) of said block (AES256 for instance). This design decision at the source level, means that in our linked binary we might not have the logic for the 3DES building block, but we would still have unused decryption functions for AES256. At this point you might be thinking to yourselves that there is some downside but it isn’t this bad. Yes, the reduction of the binary size might not be optimal. Yes, it relies on design decisions at the source level side, and these are not exactly a top priority when designing which functions goes into which source files. So what? Well, what will happen with more sophisticated coding scenarios? Let’s say someone in our library designed the following logging module: The logger registers a constructor function to initialize itself, and gives it a given priority. The logging module exposes a MACRO to define a constructor function stub per C file, that will leverage the already initialized logging module. This ctor will be assigned a matching priority to ensure correct invocation order. In a dynamic linking scenario, all ctor functions will be taken from the matching “init list” in the ELF file, and will be dynamically added to the init list of the executable program in load time. However, what will happen in the static linking scenario? Our single object file will “pull in” some files from the logging module when the symbols are being resolved, as needed by the logger’s ctor stub in our object file. If all the logic in the logger module will be bound to the same object file, we should be good to go. Yet, what if the logger’s ctor function is implemented in a different object file? Well, tough luck. No one requested this file, and the linker will never know it needs to link it to our static program. The result? crash at runtime. While we can argue about the pros and cons of such a logging mechanism, this is just an example of the potential issues with constructors and destructors in both C and C++. In most cases, the relevant software teams will say it is “too late” to change the software design, and force us to try and fix the issue purely at a linking level. Linker flags, not what you expected In the previous section we’ve seen that for some software design choices, the default behavior of the linker will result in an incomplete static linking of our library. Luckily for us, we also provide the users of the SDK with the compilation and linking definitions to be used with our libraries. If there will be any linker flag that can modify this default linker behavior, we can just add it to our definition files and it will seamlessly work for our users. I wish it would have been that simple. Linkers do expose the following pair of flags for static archives: -Wl,--whole-archive - From this point onward, whenever you encounter a static archive, take every object file from it, regardless of any symbol resolution decision. - From this point onward, whenever you encounter a static archive, take every object file from it, regardless of any symbol resolution decision. -Wl,--no-whole-archive - From this point onward, revert back to the default behavior when handling static archives, and take the object files only on a need-to-use basis. On paper, this looks great. We can wrap our static archive with the above flags, and all object files from it will be taken, case closed. Or is it? If we re-read the above, we can see there is still some remaining issue. The above states that we should: … take every object file from it, regardless of any symbol resolution decision This can also be rephrased as follows: Take the entire static archive, even if no symbol from it is ever needed And suddenly, this looks less promising. First problematic scenario — Our SDK has 4 libraries that are bundled together under the same linking definition ( .pc ) file. This means that all of them will be taken, even if the user only needs one. In DPDK’s case, this means 190 libraries(!) to be taken, instead of around 10–20 that are actually needed. $ cat /usr/lib/x86_64-linux-gnu/pkgconfig/libdpdk.pc prefix=/usr includedir=${prefix}/include/dpdk libdir=${prefix}/lib/x86_64-linux-gnu Name: DPDK Description: The Data Plane Development Kit (DPDK). Note that CFLAGS might contain an -march flag higher than typical baseline. This is required for a number of static inline functions in the public headers. Version: 23.11.0 Requires: libdpdk-libs, libbsd Requires.private: libmlx5, libibverbs, libxdp >= 1.2.2, libbpf, zlib, jansson, libmana, libmlx4, libpcap, libcrypto, libisal, libelf Libs.private: -Wl,--whole-archive -L${libdir} -l:librte_common_cpt.a -l:librte_common_dpaax.a -l:librte_common_iavf.a -l:librte_common_idpf.a -l:librte_common_octeontx.a -l:librte_bus_auxiliary.a -l:librte_bus_cdx.a -l:librte_bus_dpaa.a -l:librte_bus_fslmc.a -l:librte_bus_ifpga.a -l:librte_bus_pci.a -l:librte_bus_platform.a -l:librte_bus_vdev.a -l:librte_bus_vmbus.a -l:librte_common_cnxk.a -l:librte_common_mlx5.a -l:librte_common_nfp.a -l:librte_common_qat.a -l:librte_common_sfc_efx.a -l:librte_mempool_bucket.a -l:librte_mempool_cnxk.a -l:librte_mempool_dpaa.a -l:librte_mempool_dpaa2.a -l:librte_mempool_octeontx.a -l:librte_mempool_ring.a -l:librte_mempool_stack.a -l:librte_dma_cnxk.a -l:librte_dma_dpaa.a -l:librte_dma_dpaa2.a -l:librte_dma_hisilicon.a -l:librte_dma_idxd.a -l:librte_dma_ioat.a -l:librte_dma_skeleton.a -l:librte_net_af_packet.a -l:librte_net_af_xdp.a -l:librte_net_ark.a -l:librte_net_atlantic.a -l:librte_net_avp.a -l:librte_net_axgbe.a -l:librte_net_bnx2x.a -l:librte_net_bnxt.a -l:librte_net_bond.a -l:librte_net_cnxk.a -l:librte_net_cpfl.a -l:librte_net_cxgbe.a -l:librte_net_dpaa.a -l:librte_net_dpaa2.a -l:librte_net_e1000.a -l:librte_net_ena.a -l:librte_net_enetc.a -l:librte_net_enetfec.a -l:librte_net_enic.a -l:librte_net_failsafe.a -l:librte_net_fm10k.a -l:librte_net_gve.a -l:librte_net_hinic.a -l:librte_net_hns3.a -l:librte_net_i40e.a -l:librte_net_iavf.a -l:librte_net_ice.a -l:librte_net_idpf.a -l:librte_net_igc.a -l:librte_net_ionic.a -l:librte_net_ipn3ke.a -l:librte_net_ixgbe.a -l:librte_net_mana.a -l:librte_net_memif.a -l:librte_net_mlx4.a -l:librte_net_mlx5.a -l:librte_net_netvsc.a -l:librte_net_nfp.a -l:librte_net_ngbe.a -l:librte_net_null.a -l:librte_net_octeontx.a -l:librte_net_octeon_ep.a -l:librte_net_pcap.a -l:librte_net_pfe.a -l:librte_net_qede.a -l:librte_net_ring.a -l:librte_net_sfc.a -l:librte_net_softnic.a -l:librte_net_tap.a -l:librte_net_thunderx.a -l:librte_net_txgbe.a -l:librte_net_vdev_netvsc.a -l:librte_net_vhost.a -l:librte_net_virtio.a -l:librte_net_vmxnet3.a -l:librte_raw_cnxk_bphy.a -l:librte_raw_cnxk_gpio.a -l:librte_raw_dpaa2_cmdif.a -l:librte_raw_ifpga.a -l:librte_raw_ntb.a -l:librte_raw_skeleton.a -l:librte_crypto_bcmfs.a -l:librte_crypto_caam_jr.a -l:librte_crypto_ccp.a -l:librte_crypto_cnxk.a -l:librte_crypto_dpaa_sec.a -l:librte_crypto_dpaa2_sec.a -l:librte_crypto_ipsec_mb.a -l:librte_crypto_mlx5.a -l:librte_crypto_nitrox.a -l:librte_crypto_null.a -l:librte_crypto_octeontx.a -l:librte_crypto_openssl.a -l:librte_crypto_scheduler.a -l:librte_crypto_virtio.a -l:librte_compress_isal.a -l:librte_compress_mlx5.a -l:librte_compress_octeontx.a -l:librte_compress_zlib.a -l:librte_regex_mlx5.a -l:librte_regex_cn9k.a -l:librte_ml_cnxk.a -l:librte_vdpa_ifc.a -l:librte_vdpa_mlx5.a -l:librte_vdpa_nfp.a -l:librte_vdpa_sfc.a -l:librte_event_cnxk.a -l:librte_event_dlb2.a -l:librte_event_dpaa.a -l:librte_event_dpaa2.a -l:librte_event_dsw.a -l:librte_event_opdl.a -l:librte_event_skeleton.a -l:librte_event_sw.a -l:librte_event_octeontx.a -l:librte_baseband_acc.a -l:librte_baseband_fpga_5gnr_fec.a -l:librte_baseband_fpga_lte_fec.a -l:librte_baseband_la12xx.a -l:librte_baseband_null.a -l:librte_baseband_turbo_sw.a -l:librte_node.a -l:librte_graph.a -l:librte_pipeline.a -l:librte_table.a -l:librte_pdump.a -l:librte_port.a -l:librte_fib.a -l:librte_pdcp.a -l:librte_ipsec.a -l:librte_vhost.a -l:librte_stack.a -l:librte_security.a -l:librte_sched.a -l:librte_reorder.a -l:librte_rib.a -l:librte_mldev.a -l:librte_regexdev.a -l:librte_rawdev.a -l:librte_power.a -l:librte_pcapng.a -l:librte_member.a -l:librte_lpm.a -l:librte_latencystats.a -l:librte_jobstats.a -l:librte_ip_frag.a -l:librte_gso.a -l:librte_gro.a -l:librte_gpudev.a -l:librte_dispatcher.a -l:librte_eventdev.a -l:librte_efd.a -l:librte_dmadev.a -l:librte_distributor.a -l:librte_cryptodev.a -l:librte_compressdev.a -l:librte_cfgfile.a -l:librte_bpf.a -l:librte_bitratestats.a -l:librte_bbdev.a -l:librte_acl.a -l:librte_timer.a -l:librte_hash.a -l:librte_metrics.a -l:librte_cmdline.a -l:librte_pci.a -l:librte_ethdev.a -l:librte_meter.a -l:librte_net.a -l:librte_mbuf.a -l:librte_mempool.a -l:librte_rcu.a -l:librte_ring.a -l:librte_eal.a -l:librte_telemetry.a -l:librte_kvargs.a -l:librte_log.a -Wl,--no-whole-archive -Wl,--export-dynamic -lIPSec_MB -lrt -latomic Cflags: -I${includedir} This is a classic, real-life, example of why static linking will result in a bloated binary size, instead of the promised binary size reduction. OK, one possible solution is to follow SPDK’s foot steps, and split the definition files of our SDK per groups of libraries so to avoid the above. In SPDK’s scenario, this resulted in more than 80 .pc files(!), which has a toll on the developers who need to decide which linking files are needed for their program. $ ls -lh /usr/local/lib/pkgconfig/spdk_*.pc | wc -l 83 Assuming that such a split will be accepted by our users, it is still highly likely that at least some of our SDK libraries would depend on a common library. When the user’s program accumulates the link flags across all needed libraries, the end result will list this common library multiple times. This is one of the main reasons why, in my personal opinion, the .pc file format is suboptimal, to say the least. As you might have guessed, having the same library multiple times in our link flags will cause the linker to try to link it multiple times. This will result in the all to familiar “multiple definition” error. (.text+0x1b0): multiple definition of `rte_power_pause'; /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o):(.text+0x1b0): first defined here /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o): in function `rte_power_monitor_wakeup': (.text+0x1e0): multiple definition of `rte_power_monitor_wakeup'; /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o):(.text+0x1e0): first defined here /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o): in function `rte_power_monitor_multi': (.text+0x250): multiple definition of `rte_power_monitor_multi'; /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/librte_eal.a(eal_x86_rte_power_intrinsics.c.o):(.text+0x250): first defined here The linker has no flag that says “take this entire static archive, but only if at least one symbol from it is needed”. Hell, the linker doesn’t even have a flag for “don’t take this same file twice”. The end result is that defining the linking instructions for a project that consumes several SDKs has become an endless game of whack-a-mole. Every operating system might have slightly different linking definitions for the SDK. Every operating system comes with a different version of pkg-config, each with it’s own quirks and issues. And every version upgrade of the SDK might introduce a new library dependency, suddenly creating yet another linking conflict that should be resolved. Sadly, most of the “fixes” for these issues will be tailor-made workarounds. I have already seen my fair share of real-life fixes that include sed regex commands added to CMake files with the goal of capturing and removing the -Wl,--whole-archive statements from all but the first occurrence of a given static library. Obviously, this is not a sustainable solution. Static Archives — a scoping nightmare Let us assume that someone managed to come up with a silver bullet that addresses all of the above linking issues, maybe using proper link flags and a more robust pkg-config behavior. Sadly, this would still leave us with the broken-by-design structure of the static archives. If we recall, a static archive is literally an archive of raw object files. A glimpse over one such object file ( eal_common_eal_common_dev.c.o ) as extracted from DPDK’s librte_eal.a archive, will show the following exposed functions: 0000000000000000 l F .text 00000000000000a0 class_next_dev_cmp 00000000000000a0 l F .text 00000000000000ae build_devargs 0000000000000150 l F .text 000000000000000d cmp_dev_name 0000000000000160 l F .text 0000000000000112 bus_next_dev_cmp 0000000000000280 l F .text 0000000000000077 dev_str_sane_copy 0000000000000300 g F .text 0000000000000009 rte_driver_name 0000000000000310 g F .text 0000000000000009 rte_dev_bus 0000000000000320 g F .text 0000000000000009 rte_dev_bus_info 0000000000000330 g F .text 0000000000000009 rte_dev_devargs 0000000000000340 g F .text 0000000000000009 rte_dev_driver 0000000000000350 g F .text 0000000000000009 rte_dev_name 0000000000000360 g F .text 0000000000000008 rte_dev_numa_node 0000000000000370 g F .text 000000000000000f rte_dev_is_probed 0000000000000380 g F .text 000000000000013b local_dev_probe 00000000000004c0 g F .text 0000000000000032 local_dev_remove 0000000000000500 g F .text 00000000000000c6 rte_dev_probe 00000000000005d0 g F .text 0000000000000062 rte_eal_hotplug_add 0000000000000640 g F .text 000000000000012a rte_dev_remove 0000000000000770 g F .text 0000000000000043 rte_eal_hotplug_remove 00000000000007c0 g F .text 0000000000000166 rte_dev_event_callback_register 0000000000000930 g F .text 0000000000000150 rte_dev_event_callback_unregister 0000000000000a80 g F .text 00000000000000c1 rte_dev_event_callback_process 0000000000000b50 g F .text 00000000000000ef rte_dev_iterator_init 0000000000000c40 g F .text 00000000000001a2 rte_dev_iterator_next 0000000000000df0 g F .text 000000000000009c rte_dev_dma_map 0000000000000e90 g F .text 000000000000009c rte_dev_dma_unmap From the naming convention, and without even looking at the source file, one can already deduce that the local_dev_probe() function is probably an internal function, and not a public API function. This can easily be verified by inspecting the header file, or examining the symbols of the matching .so file: $ nm -D --defined librte_eal.so.24.0 | grep rte_ | wc -l 414 $ nm -D --defined librte_eal.so.24.0 | grep local_dev_probe | wc -l 0 Given lack of proper scoping in the C programming language, we now have two critical issues: Every non-static function in the SDK is suddenly a possible cause of naming conflict with all other functions in the user’s program. Our closed-source SDK suddenly leaks the names of all internal, non-static functions. Such a name conflict is demonstrated by the following simplified C program: #include #include #include #include #include void local_dev_probe() { printf("This is my project's device probe function "); } int main(int argc, char **argv) { printf("Invoking something from EAL Dev module "); rte_dev_probe(""); return 0; } The function uses a public API function from the librte_eal.a library, and specifically a public API function from the eal_common_eal_common_dev.c.o object file. This will cause our local_dev_probe() function to conflict with the library’s own internal function with the same name: /usr/bin/ld: eal_common_eal_common_dev.c.o: in function `local_dev_probe': (.text+0x380): multiple definition of `local_dev_probe'; main.o:main.c:(.text+0x0): first defined here And if that wasn’t enough, we also have the second issue — leaking the names of all non-static C functions. From my (extensive) background as a vulnerability researcher, I can tell you first-hand that this is a treasure trove of information for anyone trying to reverse engineer these binaries. Let’s just say that our product team wasn’t exactly thrilled to hear about this possible exposure of our internal logic. While Shared Objects have built-in mechanisms to limit the symbol visibility, static archives are just flawed by design. As long as we treat it as a bunch of raw object files, these symbols must be exposed, otherwise the linker might miss an additional object file that should be taken into the final static program. What do we do from here? It is my personal belief that static archives are simply the wrong tool for handling static linking, at least at an SDK level. They are flawed by design, and the surrounding tooling for them (linker flags and pkg-config) are far from optimal. If static linking is to be provided by C/C++ SDKs, we need a different solution. Something like a “Static Bundle Object” ( .sbo ) file, that will be closer to a Shared Object ( .so ) file, than to the existing Static Archive ( .a ) file. Suggested update to the ELF file header — Addition of new “Static Bundle Object” type. This bundle file can enjoy the symbol visibility guarantees of a Shared Object, as it anyway bundles together all the .o files and doesn’t need to expose their internal relations. Yes, we will lose the existing property of having a possibly reduced binary size. Yet, from my experience, this property was never given any attention when engineers design the source-level structure of their software. As such, this property was doomed to be suboptimal from the get go. From my biased perspective as an SDK developer, sacrificing the possible reduction in binary size, in favor of a robust and production-ready linking ecosystem, is a no brainer decision. Judging by the amount of lost workdays spent on chasing down these linking issues, I’m pretty sure that our customers will agree as well.

The .a File Is a Relic: Why Static Archives Were a Bad Idea All Along

Share this article

Related Articles