Booting 5000 Erlangs on Ampere One 192-core

Underjord is an artisanal consultancy doing consulting in Elixir, Nerves with an accidental speciality in marketing and outreach. If you like the writing you should really try the pro version.

In the previous post on 500 virtual linux devices on ARM64 I hinted that I expected serious improvements if we got KVM working. Well. We’re there. Let’s see what we got going on.

Disclosure: I am running a conference called Goatmire Elixir which Ampere is a sponsor of. This post is not part of the sponsorship exchange as such. It is prep for my talk for the conference which uses the hardware they lent me. So this is your transparency notice, but fundamentally I am not making comparisons on whether they are better or not. I’m learning and sharing about qemu and virtual Linux machines. Now I’d love if they paid me to shill them a bit later and I’d be transparent about that too. But this is not that :)

To recap. We have an Ampere One 192-core machine with 1 TB of RAM. The goal is to run as many virtual Linux IoT devices using the Nerves framework. We got 500 of them last time before I tried pushing any further. I also got a bit further on the same setup when I tried. Maybe 1000, I don’t recall exactly. But there have been developments, so read on!

Briefly on Nerves: the framework treats the BEAM virtual machine like the OS and essentially only uses Linux for a kernel, drivers and the like. This means we can write much if not all of the embedded device in a productive high-level language running on a provenly robust and reliable environment with memory safety and solid recovery strategies. And it means your cloud integration developer doesn’t risk seg-faulting the entire device while mangling JSON back and forth. Nerves also brings some best-practice tooling and conventions. Your init process is erlinit, your updates use fwup to provide A/B partitions and factory reset, auto-failback, validation of firmware viability, disk encryption, delta updates, streaming updates and a bunch more.

The most interesting development is the thing you can probably learn the most from. Frank Hunleth who has been my co-conspirator and a massive help saved me from fighting u-boot by .. writing another bootloader. Introducing little_loader. This adorable tractor will load up your ARM64 qemu device, consult the uboot environment that Nerves uses, find a Linux kernel from information in that and then boot. Consequently it enables the A/B upgrade features and everything else that makes Nerves great.

Writing a boot loader is a little bit ridiculous. Frank knows his way around C and apparently ChatGPT knows a fair bit about ARM and qemu. Enough to be dangerous. And where it was wrong he could rummage around until he found the way. How he does what he does is beyond me but the result is a very small boot loader that you can probably read through and understand. So if you are curious about booting ARM64 or about how qemu starts things the code should be a worthwhile read.

We got a bit tangled up in EL1 vs EL2 when we only ever needed EL1 to work. EL2 on ARM is what you’d run under if you want to be able to run VMs in your VMs so you can VM while you VM. And the version of qemu + KVM I got from Ubuntu doesn’t seem to support that. We weren’t interested in it either. At some point we might explore EL3 for secure boot and whatnot. Only time will tell.

One of the weirder challenges and something we haven’t disentangled yet is that we have some compilation issue where using the toolchains I was using the non-debug build would hang while the debug one runs fine. For now I run the debug build of the bootloader. I think it was fine from GCC 15? Anyway, hopefully we pin that down at some point. But it tripped us up a few times when the bootloader would hang due that issue rather than any actual problems with the implementation.

... continue reading