Skip to content
Tech News
← Back to articles

Building an FPGA 3dfx Voodoo with Modern RTL Tools

read original get FPGA Voodoo 3D Graphics Card → more articles
Why This Matters

This project demonstrates that recreating complex, fixed-function graphics hardware like the Voodoo 1 FPGA is achievable with modern RTL tools, highlighting the potential for hardware enthusiasts and developers to accurately emulate legacy GPUs. It underscores the importance of detailed hardware modeling and debugging techniques in preserving and understanding classic graphics architectures, which can inform both retro computing and future hardware design. For consumers and the industry, this signifies a move toward more accessible hardware replication and customization using open-source tools.

Key Takeaways

Building an FPGA 3dfx Voodoo with Modern RTL Tools

This frame of Screamer 2 was rendered not by an original 3dfx card and not by an emulator, but by an FPGA reimplementation of the Voodoo 1 that I wrote in SpinalHDL. Available on GitHub.

What surprised me was not just that it worked. It was that a design like this can now be described, simulated, and debugged by one person, provided the tools let you express the architecture directly and inspect execution at the right level of abstraction.

The Voodoo 1 is old, but it is not simple. It has no transform-and-lighting hardware and no programmable shaders, so all of its graphics behavior is fixed in silicon: gradients for Gouraud shading, texture sampling, mipmapping, bilinear and trilinear filtering, alpha clipping, clipping, depth testing, fog, and more. A modern GPU concentrates much of its complexity in flexible programmable units. The Voodoo concentrates it in a large number of hardwired rendering behaviors.

One of the bugs that drove this home looked at first like a framebuffer hazard. Small clusters of partially translucent text and overlay pixels would go mysteriously transparent, even though most of the frame looked fine. The real issue turned out not to be one broken subsystem, but several small hardware-accuracy mismatches stacking up in exactly the wrong way. That bug ended up being a good summary of the whole project: the hard part was not "making triangles appear." It was matching the Voodoo's exact behavior closely enough that the wrong pixels stopped appearing.

This post is about the two abstractions that made that tractable. The first is how I represented the Voodoo's register semantics in SpinalHDL. The second is how I debugged a deep graphics pipeline using netlist-aware waveform queries in conetrace.

A Fixed-Function Chip That Is Harder Than It Looks At first glance, the Voodoo looks almost modest. It is a memory-mapped accelerator with one job: render triangles as quickly as possible. Unlike later accelerators, it does not do transform and lighting, which means the host CPU handles the heavier 3D math. That can make the hardware sound simpler than it really is. Even a single triangle may involve interpolated colors, texture sampling, mip level selection, bilinear or trilinear filtering, alpha clipping, depth comparison, clipping, and fog. None of these operations are programmable in the modern sense. They are all baked into the silicon. That is the central contrast. In modern GPUs, complexity often comes from flexibility. In the original Voodoo, complexity comes from how many rendering behaviors are directly encoded in fixed-function hardware.

Why Register Writes Cannot All Behave the Same Way That fixed-function style shows up clearly in the register interface. On the Voodoo, writing to `triangleCmd` or `ftriangleCmd` launches a triangle. The other registers in the register bank describe how that triangle should be rasterized: which gradients to use, how textures should be sampled, which tests should run, and so on. The catch is that the Voodoo is deeply pipelined. Rendering a pixel involves a series of stages: stepping gradients, sampling textures, combining colors, comparing against the depth buffer, and more. Pipelining slices that work into stages so multiple pixels can be in flight at once. That is how the chip achieves throughput that software cannot match. Figure 1: The hard part is deciding which register writes may apply immediately, which must move with in-flight work, and which must wait until the pipeline is empty. But pipelining creates a problem for the register model. Imagine triangle A is still moving through the pipeline while the CPU starts configuring triangle B. If a rendering setting changes too early, late pixels from triangle A may see state intended for triangle B. The result is subtle corruption: part of a triangle rendered with the wrong texture mode, wrong blending mode, or wrong depth behavior. There are only two ways around that. Either a register write waits until the pipeline has drained before taking effect, or the write travels forward in step with the in-flight work so each triangle sees the state that belongs to it. In other words, register writes on the Voodoo are not just configuration updates. They are part of the timing contract of the machine.

The Voodoo's Four Register Behaviors In my model, Voodoo registers fall into four categories: Type Behavior FIFO Enqueued and applied in order FIFO + Stall Enqueued, but only applied once the pipeline has drained Direct Applied immediately Float Converted, then written to the fixed-point form of a register The important point is that these categories are architectural, not just software-facing. A register type tells you whether a piece of state can change immediately, whether it must move with in-flight work, or whether it must wait for the machine to become quiescent. Figure 2: Why the register categories exist at all: without them, new state can bleed into old work. That distinction turns out to be a very natural thing to model directly in the HDL.

Encoding Register Semantics in SpinalHDL The Voodoo has 430 configuration fields spread across many registers, with each register belonging to one of those categories. In a traditional HDL like Verilog, the difference between these register types usually ends up spread across several places: the register declarations, the bus-side control logic, the pipeline-side handling, and whatever external documentation describes the map. SpinalHDL has a useful abstraction here called `RegIf`. It lets you declare registers naturally and generates much of the surrounding control logic for you. I extended it so that a register declaration could directly encode Voodoo-specific semantics such as FIFO behavior, synchronized writes, and float aliases. For example: val startR = busif .newRegAtWithCategory( 0x020 , "startR" , RegisterCategory .fifoNoSync) .field( AFix . SQ ( 24 bits, 12 bits), AccessType . WO , 255 << 12 , "Starting red value" ) .withFloatAlias() .asOutput() This declares the starting R value used by the gradient walker. In one place it specifies the address, name, category, data type, access mode, reset value, and the presence of a floating-point alias. Elsewhere in the design, `startR` simply appears as a normal signal. The float alias is particularly useful here. The original Voodoo exposes a second register 128 addresses above the fixed-point one which accepts a floating-point write and converts it before storing the result in the fixed-point register. That behavior is part of the register interface itself, so it made sense to represent it there rather than scatter the logic elsewhere. Because the register metadata is explicit, `RegIf` can also export the map to other formats such as headers or SystemRDL. In my case I additionally use that metadata to drive a `PciFifo` component that emulates the Voodoo's register semantics. FIFO-style writes are queued. Synchronized writes stall until the pipeline is empty. Float aliases are routed through a float-to-fixed converter and then rewritten to the original address before they enter the queue. The gain here is not just fewer lines of code. It is that the architectural meaning of a register lives in one place. Instead of just being documentation, the register map is an executable description of how the machine behaves.

... continue reading