The Anatomy of a Mach-O: Structure, Code Signing, and Pac

Table of Contents The Mach Object (Mach-O) is the binary format used on Apple’s operating systems for executables, libraries, and object code. It was created for the Mach kernel (hence the name) and introduced in NeXTSTEP, the predecessor to macOS, as a replacement for the a.out format. Mach-O’s design supports multiple architectures (via universal binaries), and contains metadata via load commands. In this post, we’ll explore Mach-O’s layout and history. Then, we will examine how macs use Mach-O for code signing integrity and for Pointer Authentication Codes (PAC) on ARM64e systems. 🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea. Mach-O structure and format basics A Mach-O is organized into three regions: a reader, a list of load commands, and the data segments/sections. Mach-O header The header identifies the file as Mach-O and specifies the target architecture and binary type. It begins with a magic number (e.g., 0xfeedfacf for 64-bit Mach-O) to indicate endianness and 32/64-bit format. It also contains the CPU type and CPU subtype, which show the required architecture (e.g., CPU_TYPE_X86_64 or CPU_TYPE_ARM64 ). For example, Apple Silicon binaries often use CPU type ARM64 with subtype ARM64e for PAC support. The header also includes the file type (object file, executable, dynamic library, etc.), the number of load commands, the total size of those commands, and a set of flags for options (such as MH_PIE for position-independent executable, MH_NOUNDEFS for no undefined symbols, etc.). The flags are in the last column of the screenshot below. Load commands Following the header is an array of load command structures. Each load command instructs the OS loader or provides metadata about the binary’s layout. Load commands can specify memory segments, libraries to load, symbol tables, etc. Examples: LC_SEGMENT_64 : defines a 64-bit memory segment, with its file offset, size, memory address, and containing sections. Some segments are: __TEXT (code and read-only data), __DATA (writable data), and __LINKEDIT (metadata like symbols and relocation info). : defines a 64-bit memory segment, with its file offset, size, memory address, and containing sections. Some segments are: LC_SYMTAB : points to the symbol and string tables for debugging and linking symbols. : points to the symbol and string tables for debugging and linking symbols. LC_DYLD_INFO_ONLY : provides info for the dynamic linker (like rebasing, binding, and other relocation data used by Apple’s dyld). : provides info for the dynamic linker (like rebasing, binding, and other relocation data used by Apple’s dyld). LC_LOAD_DYLIB / LC_LOAD_DYLINKER : names of shared libraries and the dynamic linker that the executable depends on. / : names of shared libraries and the dynamic linker that the executable depends on. LC_MAIN (or LC_UNIXTHREAD ): specifies the entry point address for the program. (or ): specifies the entry point address for the program. LC_CODE_SIGNATURE : (more on this soon) points to the code signature blob embedded in the file for code signing validation. There can be dozens of load commands. Their ncmds (number of load commands) and total size are given in the header. For example, a simple “Hello World” Mach-O might have 16 load commands including segments and link/edit info. Each load command is a structured record; some include inline data, while others point to data elsewhere in the file. Segments and sections After the load commands, the file’s content is organized into segments, which may contain one or more sections. A segment corresponds to a contiguous range in the file and a range of memory with protection (e.g., executable, readable, writable). For instance, the __TEXT segment often contains sections like __text (machine code), __stubs (jump stubs for dynamic linking), __cstring (C string literals), etc.. All of which are read-only and some executable. The __DATA segment contains sections like __data (initialized data), __bss (zero-initialized data), etc., typically marked writable. There are also special segments: e.g., __PAGEZERO is a dummy segment at address 0 (no file data) to catch null-pointer accesses. __LINKEDIT (the last segment), which holds dynamic loader/linking information; symbol tables, string/import tables, and the code signature. Sections within segments are numbered sequentially across the entire file; section indices start at 1 and continue across segment boundaries. Universal binaries (Fat Mach-O) A Mach-O file can also be wrapped in a fat header that contains multiple architecture slices. For example, one for x86_64 and one for arm64. In this case, the file begins with a fat header listing each architecture’s offset and each slice then has its own Mach-O header and segments. The system will pick the appropriate slice at runtime. On Apple Silicon, it will prefer the arm64e slice over an x86_64 slice under Rosetta. We won’t focus on fat binaries here. But, it’s good to know Mach-O was designed to support multi-arch binaries since the PowerPC-to-Intel transition (yay, mac history!) and now Intel-to-ARM transitions if we pretend that didn’t happen like three years ago. Apple marked the transition as complete in 2023. Historical note: Apple extended Mach-O with new load commands. For example, LC_DYLD_INFO was added for dyld, and LC_MAIN replaced the older thread startup command. The overall structure remains consistent though. Mach-O and code signing (the integrity chain) One feature of Mach-O binaries is its integration with Apple’s code signing. On iOS, all executables must be signed to prove their integrity and origin. On macOS, Gatekeeper by default requires downloaded apps to be signed and notarized, which entails signing with Hardened Runtime. Mach-O files carry their code signature within the binary itself as a special data blob. When an app or binary is signed (using Apple’s codesign tool or Xcode’s automatic signing), an additional load command is added to the Mach-O: LC_CODE_SIGNATURE . This command records the file offset and size of the embedded code signature data. In a signed Mach-O, this is typically the last load command, and the signature blob is at the end of the file. For example, using a Mach-O viewer on a signed macOS binary (like Safari), you can see the LC_CODE_SIGNATURE pointing to a region at the end of the file, and a hex dump at that offset will show the distinct magic number (e.g., bytes 0xFA 0xDE 0x0C 0x00 ) indicating the start of the code signature blob. Optional: How to install MachOViewer Umm Olivia, how do I find this? To find the code signature inside a Mach-O binary, you need to locate the Code Signature Load Command; use otool -l and look for the LC_CODE_SIGNATURE section in the output. This load command provides metadata. The file offset ( dataoff ) shows where the signature begins inside the binary, and the size ( datasize ) shows how much data is dedicated to the signature. For example, if you analyze Safari or a system binary like /bin/ls , you will see a LC_CODE_SIGNATURE entry listing those values. Once you have the offset and size, you can inspect the signature blob at the end of the file. Using hexdump (or another hex viewer), you can jump to the dataoff offset and display the first few lines of bytes. This region begins with the magic number (i.e., FA DE 0C 00 ), which marks the start of Apple’s code signature blob. This blob contains the Code Directory and other signature structures that macOS uses to validate integrity and authenticity. By combining the information from otool -l with a hex dump, you can see where the digital signature is stored inside the Mach-O file. FEEDFACF ? This is the Mach64 magic number, NOT the code signature magic number. Don’t get confused. :) What is in the code signature blob? Apple’s code signature is a structured collection of data, which can be referred to as a Code Signing SuperBlob. Components This blob (marked by magic numbers like 0xFADE0CC0 or 0xFADE0C01 ) contains several sub-blobs, each with an identifier and purpose. The main components are: Code Directory: This is the main part of the signature; it contains the hashes for each page of the executable code and other metadata. The code directory lists various parameters (version, flags, hash type, etc.), the binary’s identifier, the number of embedded special slots, and a hash for each 4KB page of the code section. It defines what the “approved” contents of the binary are (via hashes). It also can include a Team ID and other info. The OS uses the Code Directory to verify that the file’s contents haven’t been altered since signing. Entitlements: If the app has an entitlements plist for special permissions like accessing iCloud, Keychain, etc., those entitlements are included as a signed blob in the Mach-O’s code signature. This allows the system to know what privileges the binary should have. Resource directory and Info.plist hashes: iOS and macOS bundles includes a _CodeSignature / CodeResources manifest of resource hashes (including Info.plist ), and the CodeDirectory’s special slots hash that manifest (and Info.plist ), binding the executable’s signature to the bundle’s contents. hashes: iOS and macOS bundles includes a / manifest of resource hashes (including ), and the CodeDirectory’s special slots hash that manifest (and ), binding the executable’s signature to the bundle’s contents. Requirements: This blob encodes any code signing requirements: like which certificate must be used, or platform constraints. For App Store apps, for instance, there might be a requirement that the certificate chain is Apple’s official App Store distribution cert. CMS signature: Finally, all the above data is signed (CMS/PKCS#7 format) using the developer’s signing certificate and traced to Apple’s root CA. This provides the chain of trust; the system can verify the signature was made by a key trusted for that app or by Apple for system binaries. Superblob All these pieces are wrapped in the SuperBlob with an index; this region at dataoff begins with the CS_SuperBlob structure. Apple’s open-source libsecurity_codesigning shows a structure called CS_SuperBlob which contains a count of blobs and offsets to each. Each blob has a magic and a length, so the system can parse them. How the OS uses it When you launch a Mach-O executable, macOS/iOS will verify the code signature before allowing it to run. This is enforced by the kernel’s code signing mechanism. On macOS, the kernel extension AppleMobileFileIntegrity (AMFI) handles many checks; on iOS, it’s always enforced. The kernel will locate the LC_CODE_SIGNATURE , parse the Code Directory and ensure that: the cryptographic signature (CMS blob) is valid (signed by a trusted authority) the hashes in the Code Directory match the contents of the binary on disk If anything doesn’t check out; for example, the binary was modified; the hash check fails and the code signature is invalid. In such cases, the app will be prevented from executing or killed if already running. This guarantees the integrity of executables: they cannot be tampered with without detection (from my knowledge, feel free to prove me wrong though!) So, there are details to note. First, is the position of the signature. The code signature blob is not loaded into memory as executable code; it’s just data, often in the __LINKEDIT segment at the end of the file. It’s typically aligned and not interfering with normal code/data. Because the signature covers the rest of the file, it must be the last thing… adding anything after it (or changing file length) would break the signature. This is why you cannot just append extra data to a signed Mach-O without re-signing. Even Apple’s notarization tickets (stapled to apps after notarization) are kept separate from the Mach-O (they’re stored in the app bundle) because altering the Mach-O file would invalidate its signature. What about modifying and re-signing? If you need to modify a signed Mach-O (like to patch a binary or something), you have to either: remove the signature or re-sign the binary codesign --remove-signature can strip the LC_CODE_SIGNATURE load command and blob, converting the binary to an unsigned state. Alternatively, you can re-sign the binary with an ad-hoc signature ( codesign -s - ), which will generate a new hash and an ad-hoc signature that the OS will accept as valid. Note: An ad-hoc signature contains only the code’s hashes and no identity/certificate; it’s valid for running locally but not trusted for distribution. A mistake I’ve made is trying to replace a new signature blob without removing the old one. Mach-O doesn’t support multiple signatures, so the old LC_CODE_SIGNATURE must be removed or replaced. Apple’s signing tool usually handles this for you by overwriting the existing signature area. If I remember, I will always try to ensure the LC_CODE_SIGNATURE command’s data is correct because a mismatched offset or length will confuse the loader. Most commonly, though, I am stuck trying to debug this situation until I come across something telling me to do this again. Detached signatures? A detached signature is a digital signature stored separately from the file it signs; its purpose is to allow the verification of a file’s integrity and authenticity without altering the original file. In most cases, the signature is inside the Mach-O. An exception is for Mach-O bundles that aren’t standalone executables; for example, frameworks or plugins can have their code signatures stored in a separate CodeSignature folder next to the binary with a CodeResources or archived-expanded-entitlements.xcent file. 🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea. Lastly, the verification chain… The code signing process involves a chain of trust. The Mach-O only carries the final signed blob; the actual certificate (like the Developer ID or iPhone Distribution certificate), and its chain to Apple’s CA are verified using the system’s key store. The Mach-O’s signature includes an identifier and Team ID which are checked against what the code signing entitlements and provisioning profiles (on iOS) allow. For the purposes of Mach-O anatomy, the main point is that Mach-O provides the container for all this code signing metadata, so the OS can do offline verification without needing external resources. Mach-O and Pointer Authentication Codes (PAC) Apple’s newer ARM64-based devices (specifically those with the ARMv8.3-A architecture and above, like A12 Bionic and M1 chips) introduce Pointer Authentication! Pointer Authentication Codes (PACs) are a hardware-assisted security feature. It adds a cryptographic signature to pointer values to help prevent attacks, like buffer overflows that overwrite return addresses or function pointers. While PAC is a CPU/architecture feature, it has implications for Mach-O because Mach-O headers and code must accommodate it. ARM64e architecture ARM64e refers to the ARMv8.3-A ISA with pointer authentication enabled; the “e” stands for extended or enhanced. Mach-O headers indicate this via the CPU subtype. For example, an iPhone XS or Apple M1 binary might be marked as CPU_TYPE_ARM64 , CPU_SUBTYPE_ARM64E in the Mach header, whereas earlier 64-bit ARM binaries with no PAC appear to be ARM64_ALL (F5 for “ARM_ALL”). This means the binary was compiled to use PAC instructions and will only fully work on PAC-capable processors. Apple made ARM64e the default for system software on A12+ devices. The macOS kernel and dynamic loader are “aware” of this subtype. For instance, the loader will refuse to load an ARM64e slice on a non-PAC device. How PAC works The ARM64e architecture repurposes some unused high-order bits of 64-bit pointers to store a Pointer Authentication Code, which is a small cryptographic tag for that pointer. Dedicated CPU instructions are used to sign a pointer (generate and insert the PAC), and later authenticate, a pointer (verify and strip the PAC). The PAC is computed using a secret key stored in special CPU registers, and a “context” value, like another register or the pointer’s address. If a pointer’s PAC doesn’t match (meaning the pointer was likely corrupted or forged), the authentication instruction will fail and mark the pointer as invalid, typically by setting a bit that causes a fault on use. Thus, PAC can detect if you try to overwrite protected pointers (like return addresses on the stack or vtable pointers in memory) without knowing the secret key. “The arm64e architecture introduces pointer authentication codes (PACs) to detect and guard against unexpected changes to pointers in memory… Pointer authentication works by adding a cryptographic signature — or PAC — to unused high-order bits of a pointer before storing the pointer. Another instruction removes and authenticates the signature after reading the pointer back. Any change to the stored value between the write and the read invalidates the signature. The CPU interprets authentication failure as memory corruption and sets a high-order bit in the pointer, making it invalid and causing the app to crash.” – Apple’s Documentation On your mac, the compiler and LLVM automatically insert PAC instructions when you build for arm64e. For example, function prologues on arm64e might include an instruction to sign the return address before storing it using a PAC key for return addresses, and the epilogue will authenticate it before returning. There are instruction mnemonics for PAC, such as: PACIA (sign pointer in register with Instruction key A) (sign pointer in register with Instruction key A) AUTIA (authenticate with key A) (authenticate with key A) BLRAA (branch with link, authenticate target with key A) (branch with link, authenticate target with key A) RETAB (authenticate link register with key B and return) These replace or augment the normal BLR (branch to register) and RET instructions for indirect calls and returns. Data pointers can be signed with different keys as well (APDA, APDB keys, etc., for data pointers vs. instruction pointers). Mach-O’s role In case anyone is wondering, the Mach-O format itself doesn’t need drastic changes to support PAC. However, there are a few connections. CPU subtype First, the Mach-O CPU subtype distinguishes PAC-enabled binaries (arm64e) from normal arm64. As mentioned, this ensures the loader knows which slice to pick and whether the running CPU can handle it. For instance, on Apple Silicon Macs, Apple’s own binaries are marked arm64e (taking advantage of PAC), whereas third-party apps might still show up as arm64. The system can run an arm64 binary on a PAC-capable (arm64e) CPU because the binary contains no PAC instructions, but an arm64e binary on a CPU without pointer authentication would include “undefined” opcodes. Thus, the Mach-O subtype ensures the loader prevents incompatible execution. Additionally, the Mach-O code and data sections will contain these new PAC instructions and PAC-ed pointers. Seeing an arm64e Mach-O means you’ll see opcodes like pacia, autia, retab in the disassembly. Application binary interface (ABI) Another change is to the ABI. On arm64e, the ABI extends AArch64 with pointer authentication. Pointers may carry PAC in the high address bits (outside the “canonical” virtual address (VA)), and calls/returns use authenticate-and-branch/return forms (e.g., BLRAA / BLRAB , RETAA / RETAB ). A “canonical” VA is a pointer value whose high bits follow the platform’s configured VA-size rules. For example, proper sign/zero extension, making it a valid, usable address rather than “tag/PAC noise.” In other words, it’s “canonical” when the pointer’s bits outside the configured VA size follow the architecture’s required pattern. i.e., after ignoring any tag/PAC bits, the remaining address lies within the VA range and the unused high bits have the mandated extension, so the MMU accepts it as valid. Also, “canonical” is not often used with ARM; it is an x86 term. I do not know of an equivalent term for ARM, which is why I am using it here. ISA appropriation! If you’re curious and can execute on an arm64e machine, you can see what the “tag/PAC noise” looks like: In LLDB, you may see LR/PC values with “random-looking” high bits. Newer LLDB masks PAC when printing to avoid confusion, but it is still a tad confusing. If an authentication check fails, the CPU deliberately corrupts the pointer’s extension bits so it becomes non-canonical and faults. Crash logs try to annotate this as a possible pointer authentication failure. These cases sometimes showing a corrupted value alongside the canonical one (e.g., 0x0040…4398 → 0x0000…4398 ). Exploitation PAC “in” Mach-O binaries means return-oriented programming (ROP) attacks and other control-flow hijacks are harder. Even if you can jump to gadgets, any attempt to return to a manipulated stack address or call a function pointer that wasn’t properly signed will likely crash the program. For kernel Mach-O binaries (the iOS kernel cache or macOS kernel on M1), PAC protects pointers in kernel context as well. Thus, you need to have PAC bypass techniques: such as using “signing gadgets,” which are pieces of code in the binary that inadvertently sign a personally controlled value. However, this info goes beyond the format. Mach-Ooooos In short, Mach-O is an extensible file, linked with Apple’s code signing. The format has a slot/section for a signature. This setup is different from ELF, where signatures (if any) are typically external, meaning any change to the Mach-O’s content invalidates the signature. As a result, you need to bypass or update the signature to run modified code. Embedding the signature in the Mach-O means the system can always verify a binary’s integrity before execution, linking the binary to Apple’s chain of trust. If you’re modifying Mach-Os, remember that any edit requires re-signing or else the OS will reject the file. Code signing also brings issues with debugging and reversing (e.g., needing to strip signatures or use special entitlements to run unsigned code), but those are the security trade-offs made for the common man, right? If you enjoyed this post on the Mach-O format, consider reading macOS Internals for Detection Engineers. Sources, inspiration, learn more, and thanks!

The Anatomy of a Mach-O: Structure, Code Signing, and Pac

Share this article

Related Articles