Writing an operating system kernel from scratch

Posted on: September 13, 2025 | at 09:30 AM Follow @popovicu94 I recently implemented a minimal proof of concept time-sharing operating system kernel on RISC-V. In this post, I’ll share the details of how this prototype works. The target audience is anyone looking to understand low-level system software, drivers, system calls, etc., and I hope this will be especially useful to students of system software and computer architecture. This is a redo of an exercise I did for my undergraduate course in operating systems, and functionally it should resemble a typical operating systems project. However, this experiment focuses on modern tooling, as well as the modern architecture of RISC-V. RISC-V is an amazing technology that is easy to understand more quickly than other CPU architectures, while remaining a popular choice for many new systems, not just an educational architecture. Finally, to do things differently here, I implemented this exercise in Zig, rather than traditional C. In addition to being an interesting experiment, I believe Zig makes this experiment much more easily reproducible on your machine, as it’s very easy to set up and does not require any installation (which could otherwise be slightly messy when cross-compiling to RISC-V). Table of contents GitHub repo The final code for this experiment is on GitHub here. We’ll be referencing the code from it as we go. GitHub should be the source of truth and may be slightly out of sync with the code below. Recommended reading The basic fundamentals of computer engineering and specifically computer architecture are assumed. Specifically, knowledge of registers, how the CPU addresses memory, and interrupts is all necessary. Before diving deep into this experiment, it’s recommended to also review the following background texts: Unikernel We’ll be developing a type of unikernel. Simply put, this setup links the application code directly with the OS kernel it depends on. Essentially, everything is bundled into a single binary executable, and the user code is loaded into memory alongside the kernel. This bypasses the need to separately load the user code at runtime, which is a complex field in itself (involving linkers, loaders, etc.). SBI layer RISC-V supports a layered permissions model. The system boots into machine mode (M), which is completely bare-metal, and then supports a couple of other less privileged modes. Please check the background texts for more details; below is a quick summary: M-mode can do pretty much anything; it is fully bare-metal. In the middle is S-mode, supervisor, which typically hosts the operating system kernel. At the bottom is U-mode, user, where application code runs. Lower privilege levels can send requests to higher privilege levels. We’ll assume that at the bottom of our software stack is an SBI layer, specifically OpenSBI. Please study this text for the necessary background, as we’ll use the SBI layer to manage console printing and control the timer hardware. While manual implementation is possible, I wanted to add more value to this text by demonstrating a more portable approach with OpenSBI. Goal for the kernel We want to support a few key features for simplicity: Statically define threads ahead of execution; i.e., dynamic thread creation is not supported. Additionally, for simplicity, threads are implemented as never-ending functions. Threads operate in user mode and are able to send system calls to the kernel operating in S-mode. Time is sliced and allocated among different threads. The system timer will be set to tick every couple of milliseconds, at which point a thread may be switched out. Finally, development is targeted for a single-core machine. Virtualization and what exactly is a thread Before implementing threads, we should decide what they really are. The concept of threads in a time-sharing environment enables multiple workloads to run on a single core (as noted above, we’re focusing on single-core machines), while the programming model for each thread remains largely the same as if it were the sole software on the machine. This is a loose definition, which we will refine. To understand time-sharing, let’s briefly consider its contrast: cooperative scheduling/threading. In cooperative scheduling/threading, a thread voluntarily yields CPU time to another workload. Eventually, the expectation is that another thread will yield control back to the first. function thread(): operation_1(); operation_2(); YIELD(); operation_3(); YIELD(); ... To be clear, this isn’t an “outdated” technique, despite being older. In fact, it’s alive and well in many modern programming languages and their runtimes (often abstracted from programmers). One good example is Go, which uses Goroutines to run multiple workloads on top of one operating system thread. While programmers don’t necessarily add explicit yield operations, the compiler and runtime can inject them into the workload. Now, it should be clearer what it means for the programming model to remain largely the same in a time-sharing context. The thread would naturally look like this: function thread(): operation_1(); operation_2(); operation_3(); ... There are simply no explicit yield operations; instead, the kernel utilizes timers and interrupts to seamlessly switch between threads on the same core. This is precisely what we’ll implement in this experiment. When multiple workloads run on the same resource, and each retains the same programming model as if it were the only workload, we can say the resource is virtualized. In other words, if we’re running 5 threads on the same core, each thread “feels” like it has its own core, effectively running on 5 little cores instead of 1 big core. More formally, each thread retains its own view of the core’s architectural registers (in RISC-V, x0-x31 and some CSRs, more on this below) and… some memory! Let’s look deeper into that. The stack and memory virtualization To begin, a thread has its own stack for reasons we’ll analyze shortly. The rest of the memory is “shared” with other threads, but this requires further investigation. It’s important to understand that hardware virtualization exists on a spectrum, rather than as a few rigid options. Here are some of the options for virtualization: Threads: virtualizes architectural registers and stacks, but not much else; i.e., different threads can share data elsewhere in memory. Process: more heavyweight than threads, memory is virtualized such that each process “feels” like it has a dedicated CPU core and its own memory untouchable by other processes; additionally, a process houses multiple threads. Container: virtualizes even more - each container has its own filesystem and potentially its own set of network interfaces; containers share the same kernel and underlying hardware. VM: virtualizes everything. There are many more shades in between, and each of these options likely has different subtypes. The point here is that all these approaches enable running different workloads with varying isolations, or more intuitively, different views of the machine and their environment. Interestingly, if you examine the Linux kernel source code, you won’t find a construct explicitly called a container. What we popularly call containers isn’t a mechanism baked into the kernel, but rather a set of kernel mechanisms used together to form a specific view of the environment for our workload. For example, the chroot mechanism restricts filesystem visibility, while cgroups impose limits on workloads; together, these form what we call a container. Furthermore, I believe (though don’t quote me on this) that the boundaries between threads and processes in Linux are somewhat blurred. To the best of my knowledge, both are implemented on top of tasks in the kernel, but when creating a task, the API allows different restrictions to be specified. Ultimately, this is all to say that we’re always defining a workload with varying restrictions on what it can see and access. When and why to apply different restrictions is a topic for another day. Many questions arise when writing an application, ranging from the difficulty of an approach to its security. Virtualizing a thread In this experiment, we’ll implement minimal virtualization with very basic, time-sharing threads. Therefore, the goals are the following: The programming model for a thread should remain mostly untouched. As long as a thread doesn’t interact with memory contents used by other threads, its programming model should remain consistent, powered by time-sharing. A thread should have its own protected view of architectural registers, including some RISC-V CSRs. A thread should be assigned its own stack. It should be obvious why a thread needs its own view of the registers. If other threads could freely touch a thread’s registers, the thread wouldn’t be able to do any meaningful work. All (I believe) RISC-V instructions work with at least one register, so protecting a thread’s register view is essential. Furthermore, assigning a private stack to a thread is necessary, though slightly less obvious. The answer is that different stacks are needed to manage different execution contexts. Namely, when a function is invoked, by convention, the stack is used to allocate function-private variables. Additionally, registers like ra can be pushed to the stack to retain the correct return address from a function (in case another function is invoked within it). In short, there are various reasons, per RISC-V convention, why the stack is needed to maintain the execution context. The details of RISC-V calling conventions will not be described here. Interrupt context It’s crucial to understand how interrupt code runs and what it should consist of, as this mechanism will be heavily exploited to achieve seamless time-sharing between threads. For a detailed, practical example, please check out this past text. I’ll briefly include the assembly for the timer interrupt routine from that text: s_mode_interrupt_handler: addi sp , sp ,- 144 sd ra, 136 ( sp ) sd t0, 128 ( sp ) sd t1, 120 ( sp ) sd t2, 112 ( sp ) sd s0, 104 ( sp ) sd a0, 96 ( sp ) sd a1, 88 ( sp ) sd a2, 80 ( sp ) sd a3, 72 ( sp ) sd a4, 64 ( sp ) sd a5, 56 ( sp ) sd a6, 48 ( sp ) sd a7, 40 ( sp ) sd t3, 32 ( sp ) sd t4, 24 ( sp ) sd t5, 16 ( sp ) sd t6 , 8 ( sp ) addi s0, sp , 144 call clear_timer_pending_bit call set_timer_in_near_future li a1, 33 lla a0,.LC0 call debug_print nop ld ra, 136 ( sp ) ld t0, 128 ( sp ) ld t1, 120 ( sp ) ld t2, 112 ( sp ) ld s0, 104 ( sp ) ld a0, 96 ( sp ) ld a1, 88 ( sp ) ld a2, 80 ( sp ) ld a3, 72 ( sp ) ld a4, 64 ( sp ) ld a5, 56 ( sp ) ld a6, 48 ( sp ) ld a7, 40 ( sp ) ld t3, 32 ( sp ) ld t4, 24 ( sp ) ld t5, 16 ( sp ) ld t6 , 8 ( sp ) addi sp , sp , 144 sret This assembly was obtained by writing a C function tagged as an S-level interrupt in RISC-V. With this tag, the GCC compiler knew how to generate the prologue and epilogue of the interrupt routine. The prologue preserves architectural registers on the stack, and the epilogue recovers them (in addition to specifically returning from S-mode). All of this was generated by correctly tagging the C function’s invoking convention. This somewhat resembles function calling, and that’s essentially what it is. Interrupts can be thought of (in a very simplified sense) as functions invoked by some system effect. Consequently, utilized registers must be carefully preserved on the stack and then restored at the routine’s exit; otherwise, asynchronous interrupts like timer interrupts would randomly corrupt architectural register values, completely blocking any practical software from running! Implementation (high-level) We’ll explore the implementation by first describing the high-level idea and then digging into the code. Leveraging the interrupt stack convention Adding an interrupt is, in a way, already introducing a form of threading to your application code. In a system with a timer interrupt, the main application code runs, which can occasionally be interleaved with instances of timer interrupt invocations. The core jumps to this interrupt routine when the timer signals, and it carefully restores the architectural state before control flow returns to the “main thread”. There are two control flows running concurrently here: Main application code. Repetitions of the interrupt routine. This interleaving of the timer interrupt can be leveraged to implement additional control flows, and the main idea is outlined below. The core of the interrupt routine is sandwiched between the prologue and the epilogue. That’s where the interrupt is serviced before control returns to the main application thread by restoring registers from the stack. However, why must we restore the registers from the same stack location? If our interrupt logic swaps the stack pointer to some other piece of memory, we’ll end up with a different set of architectural register values recovered, thus entering a whole different flow. In other words, we achieve a context switch, and this is precisely how it’s implemented in this experiment. We’ll see the code for it shortly. Kernel/user space separation We can now delineate the kernel space and user space. With RISC-V, this naturally translates to kernel code running in supervisor (S) mode and user space code running in U-mode. The machine boots into machine (M) mode, and since we want to leverage the SBI layer, we’ll allow OpenSBI to run there. Then, the kernel will perform some initial setup in S-mode before starting the U-mode execution of user space threads. Periodic timer interrupts will enable context switches, and the interrupt code will execute in S-mode. Finally, user threads will be able to make system calls to the kernel. Implementation (code) Please refer to the GitHub repository for the full code; we will only cover core excerpts below. Assembly startup As usual, a short assembly snippet is needed to start our S-mode code and enter the “main program” in Zig. This is in startup.S . ... done_bss: # Jump to Zig main call main ... The rest of the assembly startup primarily involves cleaning up the BSS section and setting up the stack pointer for the initial kernel code. Main kernel file and I/O drivers We’ll now examine kernel.zig , which contains the main function. First, we probe the OpenSBI layer for console capabilities. We’ll only consider running on a relatively recent version of OpenSBI (from the last few years) that includes console capability. Otherwise, the kernel will halt and report an error. export fn main() void { const initial_print_status = sbi.debug_print(BOOT_MSG); if (initial_print_status.sbi_error != 0) { // SBI debug console not available, fall back to direct UART const error_msg = "ERROR: OpenSBI debug console not available! You need the latest OpenSBI. "; const fallback_msg = "Falling back to direct UART at 0x10000000... "; uart.uart_write_string(error_msg); uart.uart_write_string(fallback_msg); uart.uart_write_string("Stopping... We rely on OpenSBI, cannot continue. "); while (true) { asm volatile ("wfi"); } unreachable; } main is marked as export to conform to the C ABI. Here, we have a lightweight implementation of a couple of I/O drivers. As you can see, writing can occur in one of two ways: either we go through the SBI layer ( sbi.zig ) or, if that fails, we use direct MMIO ( uart_mmio.zig ). The SBI method should theoretically be more portable, as it delegates output management details to the M-level layer (essentially what we do with MMIO), freeing us from concerns about exact memory space addresses. Let’s quickly look at sbi.zig : // Struct containing the return status of OpenSBI pub const SbiRet = struct { sbi_error: isize, value: isize, }; pub fn debug_print(message: []const u8) SbiRet { var err: isize = undefined; var val: isize = undefined; const msg_ptr = @intFromPtr(message.ptr); const msg_len = message.len; asm volatile ( \\mv a0, %[len] \\mv a1, %[msg] \\li a2, 0 \\li a6, 0x00 \\li a7, 0x4442434E \\ecall \\mv %[err], a0 \\mv %[val], a1 : [err] "=r" (err), [val] "=r" (val), : [msg] "r" (msg_ptr), [len] "r" (msg_len), : .{ .x10 = true, .x11 = true, .x12 = true, .x16 = true, .x17 = true, .memory = true }); return SbiRet{ .sbi_error = err, .value = val, }; } This is very straightforward; we’re simply performing the system call exactly as described in the OpenSBI documentation. Note that when I first wrote this code, I wasn’t fully familiar with Zig’s error handling capabilities, hence the somewhat non-idiomatic error handling. However, this can be considered a first driver in this kernel, as it directly manages output to the device. Next is uart_mmio.zig : // UART MMIO address (standard for QEMU virt machine) pub const UART_BASE: usize = 0x10000000; pub const UART_TX: *volatile u8 = @ptrFromInt(UART_BASE); // Direct UART write function (fallback when SBI is not available) pub fn uart_write_string(message: []const u8) void { for (message) |byte| { UART_TX.* = byte; } } This is straightforward and self-explanatory. Returning to kernel.zig and the main function, we create 3 user threads, each printing a slightly different message (the thread ID is the varying bit). At this point, the kernel setup is almost complete. The final steps involve setting up and running the timer interrupt. Once that is done, kernel code will only run when the timer interrupts the system or when user space code requests a system call. interrupts.setup_s_mode_interrupt(&s_mode_interrupt_handler); _ = timer.set_timer_in_near_future(); timer.enable_s_mode_timer_interrupt(); We could request a context switch immediately, but for simplicity, we’ll wait until the timer activates and begins the actual work in the system. S-mode handler and the context switch While the Zig compiler could generate the adequate prologue and epilogue for our S-mode handler, we will do it manually. The reason is that we also want to capture some CSRs in the context that otherwise wouldn’t have been captured by the generated routine. That’s why we use the naked calling convention in Zig. This forces us to write the entire function in assembly, though a quick escape hatch to this limitation is to call a Zig function whenever Zig logic is needed. I won’t copy paste the whole prologue and epilogue here because they are very similar to what was done in the previous C experiment with RISC-V interrupts. Instead, I’ll just focus on the bit that is different: ... // Save S-level CSRs (using x5 as a temporary register) \\csrr x5, sstatus \\sd x5, 240 ( sp ) \\csrr x5, sepc \\sd x5, 248 ( sp ) \\csrr x5, scause \\sd x5, 256 ( sp ) \\csrr x5, stval \\sd x5, 264 ( sp ) // Call handle_kernel \\mv a0, sp \\ call handle_kernel \\mv sp , a0 // Epilogue: Restore context // Restore S-level CSRs (using x5 as a temporary register) \\ld x5, 264 ( sp ) \\csrw stval, x5 \\ld x5, 256 ( sp ) \\csrw scause, x5 \\ld x5, 248 ( sp ) \\csrw sepc, x5 \\ld x5, 240 ( sp ) \\csrw sstatus, x5 ... As you can see, a couple more registers were added to the prologue and epilogue in addition to the core architectural registers. Next, within this prologue/epilogue sandwich, we invoke the handle_kernel Zig function. This routes to the correct logic based on whether the interrupt source is a synchronous system call from user space or an asynchronous timer interrupt. The reason is that we land in the same S-level interrupt routine regardless of the interrupt source, and then we inspect the scause CSR for details. To successfully work with the handle_kernel function, we need to be aware of the assembly-level calling conventions. This function takes a single integer parameter and returns a single integer parameter. Since the function signature is small, it works as simply as this: The sole function parameter is passed through the a0 architectural register. The same register also holds the function’s result upon return. This is pretty easy. Let’s quickly look at the signature of this function: export fn handle_kernel(current_stack: usize) usize { ... It is slightly awkward but gets the job done. The input to this Zig logic is the stack top before invoking the Zig logic (which inevitably leads to some data added to the stack). The function’s output is where the stack top should be after the Zig logic is done. If it differs from the input, then we’re performing a context switch. If it’s the same, the same workload thread will continue running after the interrupt. The rest of the logic is very simple. It inspects the interrupt source (system call from user space or timer interrupt) and performs accordingly. In the case of a timer interrupt, a context switch is performed. The schedule function from scheduling.zig is invoked, and it potentially returns the other stack we should switch to: const build_options = @import("build_options"); const sbi = @import("sbi"); const std = @import("std"); const thread = @import("thread"); pub fn schedule(current_stack: usize) usize { const maybe_current_thread = thread.getCurrentThread(); if (maybe_current_thread) |current_thread| { current_thread.sp_save = current_stack; if (comptime build_options.enable_debug_logs) { _ = sbi.debug_print("[I] Enqueueing the current thread "); } thread.enqueueReady(current_thread); } else { if (comptime build_options.enable_debug_logs) { _ = sbi.debug_print("[W] NO CURRENT THREAD AVAILABLE! "); } } const maybe_new_thread = thread.dequeueReady(); if (maybe_new_thread) |new_thread| { // TODO: software interrupt to yield to the user thread if (comptime build_options.enable_debug_logs) { _ = sbi.debug_print("Yielding to the new thread "); } thread.setCurrentThread(new_thread); if (comptime build_options.enable_debug_logs) { var buffer: [256]u8 = undefined; const content = std.fmt.bufPrint(&buffer, "New thread ID: {d}, stack top: {x} ", .{ new_thread.id, new_thread.sp_save }) catch { return 0; // Return bogus stack, should be more robust in reality }; _ = sbi.debug_print(content); } return new_thread.sp_save; } _ = sbi.debug_print("NO NEW THREAD AVAILABLE! "); while (true) { asm volatile ("wfi"); } unreachable; } The code from the thread module is very simple, serving as boilerplate for a basic queue that manages structs representing threads. I won’t copy it here, as it’s mostly AI-generated. It is important to note, however, that the stacks are statically allocated in memory, and the maximum number of running threads is hardcoded. The thread module also includes logic for setting up a new thread. This is where data is pushed onto the stack before the thread even runs. If you wonder why, it’s because when returning from the S-level trap handler, we need something on the stack to indicate where to go. The initial data does precisely that. We can seed the initial register values here as desired. In fact, in this experiment, we demonstrate passing a single integer parameter to the thread function by seeding the a0 register value (per calling convention) on the stack, which the thread function can then use immediately. The user space threads As mentioned in the introduction, we’ll bundle the user space and kernel space code into a single binary blob to avoid dynamic loading, linking, and other complexities. Hence, our user space code consists of regular functions: /// Example: Create a simple idle thread pub fn createPrintingThread(thread_number: usize) !*Thread { const thread = allocThread() orelse return error.NoFreeThreads; // Idle thread just spins const print_fn = struct { fn print(thread_arg: usize) noreturn { while (true) { var buffer: [256]u8 = undefined; const content = std.fmt.bufPrint(&buffer, "Printing from thread ID: {d} ", .{thread_arg}) catch { continue; }; syscall.debug_print(content); // Simulate a delay var i: u32 = 0; while (i < 300000000) : (i += 1) { asm volatile ("" ::: .{ .memory = true }); // Memory barrier to prevent optimization } } unreachable; } }.print; initThread(thread, @intFromPtr(&print_fn), thread_number); return thread; } Additionally, as mentioned above, we pre-seeded the stack such that when a0 is recovered from the stack upon the first interrupt return for a given thread, the function argument will be picked up. That’s how the print function accesses the thread_arg value and uses it in its logic. To demonstrate the user/kernel boundary, we have syscall.debug_print(content); . This conceptually behaves more or less as printf from stdio.h in C. It performs prepares the arguments to the kernel and runs a system call with these arguments which should lead to some content getting printed on the output device. Here’s what the printing library looks like (from syscall.zig ): // User-level debug_print function pub fn debug_print(message: []const u8) void { const msg_ptr = @intFromPtr(message.ptr); const msg_len = message.len; // Let's say syscall number 64 // a7 = syscall number // a0 = message pointer // a1 = message length asm volatile ( \\mv a0, %[msg] \\mv a1, %[len] \\li a7, 64 \\ecall : : [msg] "r" (msg_ptr), [len] "r" (msg_len), : .{ .x10 = true, .x11 = true, .x17 = true, .memory = true }); // Ignore return value for simplicity } System call 64 is served from the S-mode handler in kernel.zig . This is self-explanatory, and we won’t go into further details here. Running the kernel We will deploy the kernel on bare-metal, specifically on a virtual machine. In theory, this should also work on a real machine, provided an SBI layer is present when the kernel starts, and the linker script, I/O “drivers,” and other machine-specific constants are adapted. To build, we simply run zig build To now run the kernel, we run: qemu-system-riscv64 -machine virt -nographic -bios /tmp/opensbi/build/platform/generic/firmware/fw_dynamic.bin -kernel zig-out/bin/kernel Refer to the previous text on OpenSBI for details on building OpenSBI. It is strongly recommended to use a freshly built OpenSBI, as QEMU may use an outdated version if no -bios flag is passed. The output should begin with a big OpenSBI splash along with some OpenSBI data: OpenSBI v1.7 ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| Platform Name : riscv-virtio,qemu Platform Features : medeleg Platform HART Count : 1 Platform IPI Device : aclint-mswi Platform Timer Device : aclint-mtimer @ 10000000Hz Platform Console Device : uart8250 Platform HSM Device : --- Platform PMU Device : --- Platform Reboot Device : syscon-reboot Platform Shutdown Device : syscon-poweroff Platform Suspend Device : --- Platform CPPC Device : --- Firmware Base : 0x80000000 Firmware Size : 317 KB Firmware RW Offset : 0x40000 Firmware RW Size : 61 KB Firmware Heap Offset : 0x46000 Firmware Heap Size : 37 KB (total), 2 KB (reserved), 11 KB (used), 23 KB (free) Firmware Scratch Size : 4096 B (total), 400 B (used), 3696 B (free) Runtime SBI Version : 3.0 Standard SBI Extensions : time,rfnc,ipi,base,hsm,srst,pmu,dbcn,fwft,legacy,dbtr,sse Experimental SBI Extensions : none Domain0 Name : root .... Following the OpenSBI splash, we’ll see the kernel output: Booting the kernel... Printing from thread ID: 0 Printing from thread ID: 0 Printing from thread ID: 0 Printing from thread ID: 1 Printing from thread ID: 1 Printing from thread ID: 1 Printing from thread ID: 2 Printing from thread ID: 2 Printing from thread ID: 2 Printing from thread ID: 0 Printing from thread ID: 0 Printing from thread ID: 1 Printing from thread ID: 1 Printing from thread ID: 2 Printing from thread ID: 2 Printing from thread ID: 0 Printing from thread ID: 0 Printing from thread ID: 0 Printing from thread ID: 1 Printing from thread ID: 1 Printing from thread ID: 1 Printing from thread ID: 2 Printing from thread ID: 2 Printing from thread ID: 2 The prints will continue running until QEMU is terminated. If you want to build the kernel in an extremely verbose mode for debugging and experimentation, use the following command: zig build -Ddebug-logs=true After running the kernel with the same QEMU command, the output will appear as follows: Booting the kernel... DEBUG mode on Interrupt source: Timer, Current stack: 87cffe70 [W] NO CURRENT THREAD AVAILABLE! Yielding to the new thread New thread ID: 0, stack top: 80203030 Interrupt source: Ecall from User mode, Current stack: 80202ec0 Printing from thread ID: 0 Interrupt source: Ecall from User mode, Current stack: 80202ec0 Printing from thread ID: 0 Interrupt source: Ecall from User mode, Current stack: 80202ec0 Printing from thread ID: 0 Interrupt source: Timer, Current stack: 80202ec0 [I] Enqueueing the current thread Yielding to the new thread New thread ID: 1, stack top: 80205030 Interrupt source: Ecall from User mode, Current stack: 80204ec0 Printing from thread ID: 1 Interrupt source: Ecall from User mode, Current stack: 80204ec0 Printing from thread ID: 1 Interrupt source: Ecall from User mode, Current stack: 80204ec0 Printing from thread ID: 1 Interrupt source: Timer, Current stack: 80204ec0 [I] Enqueueing the current thread Yielding to the new thread New thread ID: 2, stack top: 80207030 Interrupt source: Ecall from User mode, Current stack: 80206ec0 Printing from thread ID: 2 Interrupt source: Ecall from User mode, Current stack: 80206ec0 Printing from thread ID: 2 Interrupt source: Ecall from User mode, Current stack: 80206ec0 Printing from thread ID: 2 Interrupt source: Timer, Current stack: 80206ec0 ... Conclusion Many educational OS kernels exist, but this experiment combines RISC-V, OpenSBI, and Zig, offering a fresh perspective compared to traditional C implementations. The resulting code runs on a QEMU virtual machine, which can be easily set up, even by building QEMU from source. To keep the explanation concise, error reporting was kept minimal. Should you modify the code and require debugging, sufficient clues are provided, despite some areas where the code is simplified (e.g., anonymous results after SBI print invocations like _ = ... ). Much of the code in this example was AI-generated by Claude to save time, and it should function as intended. While some parts of the code are simplified, such as stack space over-allocation, these do not detract from the experiment’s educational value. Overall, this experiment serves as a starting point for studying operating systems, assuming a foundational understanding of computer engineering and computer architecture. It likely has plenty of flaws for a practical application, but for now, we’re just hacking here! I hope this was a useful exploration. Please consider following on Twitter/X and LinkedIn to stay updated.

Writing an operating system kernel from scratch

Share this article

Related Articles