z386: An Open-Source 80386 Built Around Original Microcode

This is the fifth installment of the 80386 series. The FPGA CPU is now far enough along to run real software, and this post is about how it works. z386 is a 386-class CPU built around the original Intel microcode, in the same spirit as z8086.

The core is not an instruction-by-instruction emulator in RTL. The goal is to recreate enough of the original machine that the recovered 386 control ROM can drive it. Today z386 boots DOS 6 and DOS 7, runs protected-mode programs like DOS/4GW and DOS/32A, and plays games like Doom and Cannon Fodder. Here are some rough numbers against ao486:

Metric z386 ao486 Lines of code (cloc) 8K 17.6K ALUTs 18K 21K Registers 5K 6.5K BRAM 116K 131K FPGA clock 85MHz 90MHz 3DBench FPS 34 43 Doom (original) FPS, max details 16.5 21.0

In current builds, z386 performs like a fast (~70MHz) cached 386-class machine, or a low-end 486. It runs at a much higher clock than historical 386 CPUs, but with somewhat worse CPI (cycles per instruction). The current cache is a 16 KB, 4-way set-associative unified L1, chosen partly to keep the clock high. Real high-end 386 systems often used larger external caches, typically in the 32 KB to 128 KB range.

Doom II running on z386.

Much of this 386 microarchitecture archaeology has already been covered in the previous four posts: the multiplication/division datapath, the barrel shifter, protection and paging, and the memory pipeline. z386 tries to be both an educational reconstruction and a usable FPGA CPU. It keeps many 386-like structures: a 32-entry paging TLB, a barrel shifter shaped like the original, ROM/PLA-style decoding, the Protection PLA model, and most importantly the 37-bit-wide, 2,560-entry microcode ROM. At the same time, it uses FPGA-friendly shortcuts where they make sense, such as DSP blocks for multiplication and the small fast L1 cache.

In this post, I will fill in the rest of the design: instruction prefetch, decode, the microcode sequencer, cache design, testing, how z386 differs from ao486, and some lessons from the bring-up.

From z8086 to z386

A little background first. Last year I wrote z8086, an original-microcode-driven 8086, based on reenigne's disassembly work. That project showed that it was possible to build a working CPU around recovered microcode. Towards the end of the year, I learned that 80386 microcode had recently been extracted, and that reenigne and several others — credited at the end of this post — were working on a disassembly. They generously shared their work with me, and z386 started from there.

The 386 is a very different problem from the 8086. The instruction set is larger, the internal state is much richer, and the machine has to enforce protection, paging, privilege checks, and precise faults. More importantly, the 80386 micro-operations are denser and more contextual. If the 8086 microcode reads like a straightforward C program, the 386 microcode reads more like hand-tuned assembly: short, subtle, and full of assumptions about hidden hardware.

... continue reading