Building a 24-bit arcade CRT display adapter from scratch

Building a 24-bit Arcade CRT Display Adapter, From Scratch

In November, my friend and fellow Recurser, Frank, picked up an arcade machine for the Recurse Center. We call it the RCade. He wanted to leave the original CRT in - which I think is a great choice! - and drove it off of a Raspberry Pi. Eventually we wanted to move to a more powerful computer, but we needed a way to connect it to the display. Off-hand, I mentioned that I could build a CRT display adapter that interfaces with a normal computer over USB. This is that project.

What the display expects

The CRT in the RCade has a JAMMA connector, and Frank bought a converter that goes between VGA and JAMMA. You might think we could just use an off-the-shelf VGA adapter to drive it at this point, but it's not that simple. The CRT runs at a weird resolution; We started with 320x240 but eventually wanted to target 336x262, which is super non-standard. Even 320x240 is unattainable by most display adapters, which typically can't go below 640x480. A custom solution would allow us to output any arbitrary resolution we wanted. The other thing is that the Pi, with the VGA board we were using, only supports 18-bit colour, and we wanted to improve this. Even on the RCade's CRT, colour banding was an obvious issue. We also wanted to use a laptop, not a desktop, which meant not using a PCI-e card. Instead, a USB interface would be preferable.

Wait, but what is VGA?

VGA is a signaling protocol that maps almost exactly 1:1 with what a CRT actually does. Taken from wikimedia.org Inside of a CRT, there are 3 electron guns, which correspond to red, green, and blue colour values. Two electromagnets in the neck of the tube are responsible for steering the beam - one steers horizontally and one steers vertically. To draw an image, the beam moves across the screen one horizontal line at a time, and the electron guns are rapidly modulated in order to display the correct colour at each pixel. VGA contains analog signals for these R, G, and B electron guns. It also contains an HSYNC and VSYNC signal, which are used so that the driver and the CRT can agree on what pixel is being drawn at a given time. Between the VGA input and the CRT is a very simple circuit which locks onto these HSYNC and VSYNC pulses and synchronizes the sweeping of the beam. Taken from pyroelectro.com The HSYNC pulses happen in between horizontal lines, and the VSYNC pulses happen in between frames. There are dead zones around each pulse - referred to as the front and back porch - which give the electron beam time to sweep back across the screen. So, all we really need are those R, G, B, HSYNC, and VSYNC signals, running at precise timing, and synced properly relative to each other. Conceptually this is actually pretty simple!

Attempt 1: Using the RP2040's PIO

I like the Raspberry Pi RP2040 a lot. It's relatively cheap (around $1 USD) and has tons of on-board RAM - 264 KB in fact! It also has what is called Programmable IO, or PIO. I've never used the PIO before, but the basic idea is that you can write assembly programs where every instruction takes exactly one cycle, and has useful primitives for interacting with GPIO. It's a fairly limited instruction set, but it allows for bit-banging precise cycle-accurate custom protocols. It's exactly what I need to modulate a VGA signal. The PIO code ended up looking like this: // 1. low for 320+16=336 pixels // 2. high for 30 pixels // 3. low for 34 pixels // 4. repeat // runs on sm0 // 6 instrs -> can save some with sidesetting let hsync = pio::pio_asm!( " .wrap_target ", /* begin pixels + front porch */ " irq set 0 [2] ", // tell vsync we're doing 1 line " set pins, 1 [31] ", // go low for 32 " set X, 8 [15] ", // +16 = 48 " a: ", " jmp X-- a [31] ", // each loop 32, * 9 = 288, total = 336 /* end front porch, being assert hsync */ " set pins, 0 [29] ", // assert hsync for 30 /* end assert hsync, begin back porch */ " set pins, 1 [29] ", // deassert, wait 32 (note there is extra delay after the wrap) " .wrap " ); // NOTE - we get irq at *end* of line so we have to time things accordingly // 1. low for 242 lines -> but irq 2 every line for the first 240 // 2. high for 3 lines // 3. low for 22 lines // 4. repeat // runs on sm1 // 19 instr let vsync = pio::pio_asm!( " .side_set 1 opt ", " .wrap_target ", " set Y, 6 ", " a_outer: ", " set X, 31 ", " a: ", " wait 1 irq 0 ", " irq set 2 ", " jmp X-- a ", // 32 lines per inner loop " jmp Y-- a_outer ", // 7 outer loops = 224 " set X, 15 ", // 16 more lines = 240 " z: " " wait 1 irq 0 ", " irq set 2 ", " jmp X-- z ", " wait 1 irq 0 ", // wait for end of last rgb line " wait 1 irq 0 ", // 2 more lines for front porch " wait 1 irq 0 ", " set X, 2 side 0 ", // assert vsync " b: ", " wait 1 irq 0 ", " jmp X-- b ", // wait for 3 lines " set X, 20 side 1 ", // deassert vsync " c: ", " wait 1 irq 0 ", " jmp X-- c " // wait for 21 lines (back porch) " .wrap ", ); // 2 cycles per pixel so we run at double speed // 6 instr let rgb = pio::pio_asm!( " out X, 32 ", // holds 319, which we have to read from the FIFO " .wrap_target ", " mov Y, X ", " wait 1 irq 2 ", // wait until start of line " a: ", " out pins, 16 ", // write to rgb from dma " jmp Y-- a ", " mov pins, NULL ", // output black " .wrap " ); The full code lives here. There are 3 separate PIO programs. hsync is responsible for keeping time and generating HSYNC pulses. At the start of each line, it generates an IRQ event that the other programs use for synchronization. vsync counts these events and generates the VSYNC pulses. Finally, rgb reads pixel data from DMA and outputs to the RGB pins in precise time with the other signals. The out pins, 16 signifies that we're only doing 16-bit colour for now. There's a lot of weirdness in here to get around the constraints of the PIO. For example, between all 3 programs, only a maximum of 31 instructions are allowed. All of the VGA parameters (resolution, porch length, etc.) are hard-coded, and changing these would require at least a small rewrite. It's pretty brittle in that regard, but for our use-case it's sufficient as a proof-of-concept. Here it is running the actual CRT in the RCade: I wanted to fill the framebuffer with a repeating pattern, but I messed up my code, hence it looking weird. That's fine - it was enough to verify my VGA program worked! As an aside, every time I popped off the back of the RCade to work on it was terrifying. Not because of the lethal voltages inside, but because Recursers absolutely love the RCade. I often joke that if I were to break it, I would basically be the anti-Frank! Now that I had something that could take a framebuffer and throw it onto the CRT, it was time to get the image from my computer to the RP2040.

Let's write a kernel module!

My plan was to write a Linux kernel module that would expose itself as a framebuffer, and then send that framebuffer over USB to the RP2040. On the framebuffer side, this involved interfacing with the DRM layer. I actually made decent progress here, although I kernel panicked many, many times. I never bothered to set up a proper development environment (oops), so pretty much any bug would require me to reboot my computer. This was super annoying and tedious, although I did learn a lot. I found cursed things in the official documentation, like interrobangs! Linus pls I got as far as getting a framebuffer to show up at the correct resolution and refresh rate. Along this journey though, I discovered the GUD kernel module, and quickly realized I should use that instead.

... continue reading