Virtual Linux Devices on ARM64

500 virtual Linux devices on ARM 64

Underjord is an artisanal consultancy doing consulting in Elixir, Nerves with an accidental speciality in marketing and outreach. If you like the writing you should really try the pro version.

This is the first part of an experimental journey as I explore how many instances of my favorite IoT framework I can run on the 192 core Ampere One.

Background

I work on the Nerves project which is an IoT framework providing best-practice underpinnings and support so that you can build your IoT hubs, smart thermostats and the like with a safe and productive high-level language on a runtime known for reliability, resilience and consistent performance. The language being Elixir and the runtime being Erlang’s BEAM virtual machine.

If you want more about Erlang, we’ve had Björn on BEAM Radio talking a lot about his work on the compiler and runtime.

As part of my frequent collaborator GleSYS sponsoring Sweden’s first Elixir conference Goatmire Elixir (I would be shilling but we essentially don’t have any tickets left *shrug*) they suggested we might connect with Ampere as they have this particularly interesting hardware with the Ampere One server CPUs (you may have seen the 192 core, 3.2 GHz one discussed) and we turned it into a joint sponsorship. Since I didn’t have a talk topic lined up we discussed me doing something with their hardware which seemed fun. I love experimenting with impressive hardware.

Disclosure: This post is not part of the sponsorship exchange. They get some posts on socials and space in my newsletter along with branded presence at the event. This is me reporting on what I’m up to and providing the background for that. But I want to be transparent, they have supported the not-for-profit that runs the conference which I am organizing.

The runtime

If you know the BEAM you know it is highly concurrent and parallel. By default it starts one scheduler thread per core available and then does work stealing across those to ensure efficient use of the cores. Based on anecdata from a friend who tests these sorts of things (he has data, he shared it anecdotally) the BEAM does not scale arbitrarily with this amount of cores. I’ve speculated whether that’s due to NUMA but I don’t know the architecture of the chips well enough to say really. There would be overhead when running many schedulers of course and whatever coordination is needed. I know Meta has contributed recent updates to Erlang/OTP that should improve the many core performance. I need to see if I can find a really good benchmark for testing the limits of a single BEAM on this thing. Anyway, this was not what I primarily wanted to do.

... continue reading