Skip to content
Tech News
← Back to articles

Bytecode VMs in surprising places (2024)

read original more articles
Why This Matters

This article highlights the unexpected yet impactful use of bytecode virtual machines (VMs) beyond traditional programming languages, such as in the Linux kernel's eBPF system and SQLite's SQL execution. Recognizing these applications underscores the versatility and importance of bytecode VMs in optimizing performance and enabling flexible, secure execution environments in the tech industry. For consumers, this means more efficient networking, faster data processing, and improved system security.

Key Takeaways

In response to a question on Twitter, Richard Hipp wrote about why SQLite uses a bytecode VM for executing SQL statements.

Most people probably associate bytecode VMs with general-purpose programming languages, like JavaScript or Python. But sometimes they appear in surprising places! Here are a few that I know about.

eBPF

Did you know that inside the Linux kernel, there’s an extension mechanism that includes a bytecode interpreter and a JIT compiler?

I had no idea. Well, it’s called eBPF, and it’s pretty interesting: a register-based VM with ten general-purpose registers and over a hundred different opcodes.

The “BPF” in eBPF stands for Berkeley packet filter, and the basic idea is described in a 1993 USENIX paper:

Many versions of Unix provide facilities for user-level packet capture, making possible the use of general purpose workstations for network monitoring. Because network monitors run as user-level processes, packets must be copied across the kernel/user-space protection boundary. This copying can be minimized by deploying a kernel agent called a packet filter, which discards unwanted packets as early as possible. The original Unix packet filter was designed around a stack-based filter evaluator that performs sub-optimally on current RISC CPUs. The BSD Packet Filter (BPF) uses a new, register-based filter evaluator that is up to 20 times faster than the original design.

So it was originally designed for a pretty restricted use case: a directed, acyclic control flow graph representing a filter function for network packets. And for a long time, the Linux implementation was equally simple: two general-purpose registers, a switch-style interpreter, and no backwards branches.

A patch in 2011 added a JIT compiler for x86-64. In 2012, the first non-networking use case appeared. Then, in 2014, the BPF implementation was substantially extended on its way to becoming the universal in-kernel virtual machine:

It expands the set of available registers from two to ten, adds a number of instructions that closely match real hardware instructions, implements 64-bit registers, makes it possible for BPF programs to call a (rigidly controlled) set of kernel functions, and more. Internal BPF is more readily compiled into fast machine code and makes it easier to hook BPF into other subsystems.

... continue reading