Tech News
← Back to articles

Visualizing the ARM64 Instruction Set (2024)

read original related products more articles

Introduction

Lately I’ve been doing a lot of work with the ARM64 instruction set, and I thought it would be fun to try to visualize it. ARM64 encodes every instruction as a 32-bit integer, so one way to visualize the instruction set is by plotting the instructions along a space-filling curve, such as a Hilbert curve, and coloring them according to their instruction class (i.e., general , advsimd , float , sve , etc…).

Click here for the interactive version.

Generating the visualization

To generate this visualization, I started with Arm’s Machine Readable Architecture (MRA) Specification. The most recent version can be downloaded from here. It comes with both XML and HTML files describing the encoding and semantics of every instruction in the ISA. If you’d like to browse it, I host the HTML files at https://www.scs.stanford.edu/~zyedidia/arm64/. All the visualizations in this blog post were generated from the version released in June 2023, which covers all extensions up to and including ARMv8.9.

I wrote a small tool that parses the XML files and generates a list of all unique encodings in the architecture (roughly 3,000) along with some bits of information like the instruction’s mnemonics, class, what ARMv8 variant/feature it is a part of, and an encoding diagram.

Then I wrote another tool that iterates through every possible 32-bit instruction, decodes it according to the encoding diagram, and stores its encoding type in a file. The specification describes bits as combinations of 0 , 1 , and x , but also sometimes includes (0) and (1) . I’m not sure what the parenthesized versions mean – it seems like some existing disassemblers treat them as x so that’s what I’ve done. Maybe they are recommended but not required encodings?

One issue is that while the Arm specification gives encodings as simple bit-strings, it also provides some code in the Arm Specification Language (ASL) that can sometimes overrule the encoding. For example, the EOR instruction encoding becomes undefined if sf == '0' && N != '0' . In the future, I’d like to parse and process the ASL so that the generated decoder can handle these cases, but for now I’m handling this by running a post-processing pass that runs the Capstone disassembler on all the instructions to remove invalid ones, since Capstone properly understands these rules.

Using this mapping of every possible instruction, we can generate a Hilbert curve plot with a nice colorscheme, where instructions are categorized based on their “instruction class”: one of general , system , float , fpsimd , advsimd , sve , sve2 , mortlach , mortlach2 , and other . There are too many instructions to plot each instruction as an individual pixel, so each pixel in the image corresponds to 256 instructions, and the pixel’s alpha value corresponds to how filled the pixel is with instructions.

With a nice theme, we get pretty images like these:

... continue reading