Ever since moving back from Windows, I’ve been paranoid about latency in games on Linux. Slight changes to the environment or settings can, all of a sudden, make the mouse feel very floaty. There have been many community discussions on this topic and I’m certainly not alone in this.
To investigate, I used a small Teensy microcontroller to measure click-to-photon latency. It acts as a USB HID mouse and is paired with a light sensor pressed against the screen. I flashed it with an existing Open Source LDAT sketch, with slight modifications. The resulting setup can log hundreds of samples to a CSV file, unattended.
Measurements were done on two computers: a desktop and a laptop. They each have an Ada-generation RTX card and a Zen 4 CPU. I used virtually the same NixOS config on both, along with an up-to-date Windows 11 install on each. They were connected to the same display for most of the tests: an LG C1 at 120 Hz over HDMI. I have Radeon GPUs lying around, and plan on testing gamescope on them in particular, but that’s going to have to wait until the next batch of tests. App settings were selected to avoid hardware bottlenecks. My goal was to easily hit 120 FPS on a 120 Hz output and test for any queueing effects in the software stack.
I used KDE Wayland 6.6.4, Proton-GE 10-33, MangoHud 0.8.2 for FPS limiting (using the late method), Nvidia 595.58.03. Originally, I had meant to compare with X11 sessions as well, but with KDE removing them soon, I dropped it. On Windows, I used either the Nvidia control panel or RTSS for frame-rate limiting, interchangeably.
Despite the automated nature of the tool, launching & cataloging the runs still ends up being a lot of work. Controlling all the variables is a major pain, and I often discovered new things partway through the testing, which invalidated prior measurements. A few examples of odd behaviors are:
LG webOS toggling Black Frame Insertion when you connect a different computer on the same port.
Using KDE Konsole to mark the start of my test run initiates big wl_shm presentation surfaces, which take a long time to copy over PCIe to GPU VRAM. This trains the compositor to be extra pessimistic about timings that just spiked.
presentation surfaces, which take a long time to copy over PCIe to GPU VRAM. This trains the compositor to be extra pessimistic about timings that just spiked. Switching V-Sync modes in specific games not applying the change immediately.
Synthetic tests
As a quick validation run and an easy test of display settings, I built my own latency testing tool. It’s just a black square that goes white immediately when you click on it; perfect for the tool to react to. I added a configurable delay to simulate input processing. The test was performed on a clean Chromium profile with nothing except for out-of-the-box defaults.
... continue reading