Linux extreme performance H1 load generator

How It Works

Glass Cannon uses a fundamentally different approach to I/O than traditional load generators. Instead of one thread per connection or async callbacks, it talks directly to the Linux kernel's io_uring interface.

The Traditional Way

Most load generators (wrk, hey, ab) use epoll — the application asks the kernel "which sockets are ready?", then makes individual read() and write() system calls for each one. Every syscall means a context switch between your program and the kernel.

The Glass Cannon Way

io_uring uses two shared memory ring buffers between your program and the kernel. You write requests into the submission queue, and the kernel writes results into the completion queue. No system calls per operation. The kernel processes batches of I/O while your code processes batches of results.

Your Program submit 2048 operations (send, recv, connect...) ──── Submission Queue ────> <── Completion Queue ──── process completions (no syscall needed) Minimal context switches in steady state Linux Kernel process all I/O in kernel space batch completions into shared ring

Architecture

The main thread spawns N worker threads. Each worker owns an independent io_uring ring, a set of connections, and pre-built request buffers. There is zero communication between workers during the benchmark — no mutexes, no atomics, no shared state.

main thread | | — spawn workers, wait for duration, aggregate stats | worker 0 worker 1 worker N | | | io_uring ring io_uring ring io_uring ring buffer ring buffer ring buffer ring connections[0..K] connections[0..K] connections[0..K] | | | connect → send → recv → count → refill → ...

... continue reading