I Made Zig Compute 33M Satellite Positions in 3 Seconds. No GPU Required

I Made Zig Compute 33 Million Satellite Positions in 3 Seconds. No GPU Required.

20 Jan, 2026

I've spent the past month optimizing SGP4 propagation and ended up with something interesting: astroz is now the fastest general purpose SGP4 implementation I'm aware of, hitting 11-13M propagations per second in native Zig and ~7M/s through Python with just pip install astroz . This post breaks down how I got there.

A note on "general purpose": heyoka.py can be faster for batch-processing many satellites simultaneously (16M/s vs 7.5M/s). But it's a general ODE integrator with SGP4 as a module, requiring LLVM for JIT compilation and a C++ dependency stack that conda-forge recommends over pip. For time batched propagation, many time points for one satellite, astroz is 2x faster (8.5M/s vs 3.8M/s). Full comparison below. I'm also skipping GPU accelerated SGP4 implementations. They can be faster for massive batch workloads, but require CUDA/OpenCL setup and aren't what I'd consider "general purpose."

Why Bother Optimizing SGP4?

SGP4 is the standard algorithm for predicting satellite positions from TLE data. It's been around since the 80s and most implementations are straightforward ports of the original reference code. They work fine. You can read the implementation that I followed from SpaceTrack Report No. 3.

But "fine" starts to feel slow when you need dense time resolution. Generating a month of ephemeris data at one-second intervals is 2.6 million propagations per satellite. Pass prediction over a ground station network might need sub-second precision across weeks. Trajectory analysis for conjunction screening wants fine-grained time steps to catch close approaches. At 2-3M propagations per second (typical for a good implementation), these workloads take seconds per satellite—that adds up fast when you're doing iterative analysis or building interactive tools.

I wanted to see how fast I could make it.

Starting Point: Already Faster Than Expected

Before I even started thinking about SIMD, the scalar implementation was already matching or beating the Rust sgp4 crate, the fastest open-source implementation I could find (general purpose). I hadn't done anything clever yet; the speed came from design choices that happened to play well with how Zig compiles.

... continue reading