Pre-Emptive Multi-Tasking on Arm Cortex-M

Scroll through the blog: ‹ Newer | List All | Older › Pre-emptive Multi-Tasking on Arm Cortex-M Posted on 2025-09-28 Contents Background I write a lot of embedded software in Rust these days - sometimes for fun, and sometimes as training material, and sometimes for customers building safety-critical systems. This last group are usually already writing safety-critical software in C, and wish to switch to writing Rust. When you are writing safety-critical software in C, you are usually using some kind of Real-Time Operating System (RTOS) - something that lets you execute multiple tasks concurrently, and that provides queues and timer services so those tasks can message each other and wait for time to elapse. Not always, obviously, but I see it quite often. C-language RTOSes One option for people using an RTOS and wanting to switch to Rust, is to run Rust code on top of their an existing RTOS. A incomplete alphabetical list might include: And we're barely scratching the surface here. Some of these are full 'operating systems', that provide all the libraries you could want (filesystems, hardware drivers, networking, etc). Some are more basic 'kernels' that let you add whatever libraries you want on top. Some are rated for safety-critical usage and some are not. Some are open source, some are commercial, some offer both options. There's a lot of variation, basically. Because Rust can import and export C compatible functions, it's usually straightforward to write tasks in Rust and run them on top of any existing C-language RTOS. Ferrous Systems have examples for ThreadX and examples for FreeRTOS which you can take a look at. So this is a fine option, and one you should consider if you already have an RTOS you like. Rust-language RTOSes Of course, there are now a bunch of RTOSes that don't just let you run Rust programs, but are themselves written in Rust. Again, an incomplete, alphabetical list might include: Again, they vary in scope and intended usage. Special mention to embassy, which isn't an RTOS in the classical sense because it only provides for the concurrent execution of co-operative async tasks, and not pre-emptive context switching, but largely replaces the need for an RTOS. I also want to highlight RTIC, which is sort of an RTOS, except it does most of the work at compile time and generates bare-metal Rust code that kind of works like an RTOS, but uses the Arm NVIC interrupt controller and the Rust compiler to do all the heavy-lifting for switching tasks. But wait, what? But what is an RTOS? Any Operating System which provides Real-Time guarantees is an RTOS. Any by real-time guarantees, I mean you can assign a priority to each task, and have some measurable upper bound for how long an input to the system will take to be processed. Windows 11, for example, is not a real-time operating system, because when you press a key on the keyboard there is literally no way of predicting when the computer might respond to that key press. If the input was "pressing the brake pedal" and the output was "activating the vehicle's brake system", you can see how that might be an issue. Generally to do this, an RTOS will provide mechanisms for: spawning multiple tasks assigning priorities to those tasks allowing those tasks to park themselves until some future event happens (a message arrives on a queue, an interrupt fires, a timeout expires, etc) selecting and running the highest priority task that is not currently parked By 'task', I mean a mechanism to execute a function indefinitely, but in such a way that execution of that 'task' can be paused so that a different 'task' can use the CPU for a short while. That way it appears like you are running N infinitely long tasks in parallel, but you are actually only running one at a time and swapping between them. This is not something novel - UNIX has been doing it since the 1970s, and it was not the first - and that means it is something that is pretty well understood and considered 'tried and tested'. An aside on naming: If you're more familiar with Windows, macOS or Linux, what I call a Task maps closely to what you might know as a Thread. It should also be noted that what I call a Task isn't what Rust calls a Task, as those are specifically asynchronous tasks, which is something different. You might say, "But why didn't you just call them Threads?" and the answer is FreeRTOS also calls them Tasks and because to me, Threads are short-lived things you can spawn and then wait on, whilst Tasks are things that are created at start-up. Anyway, naming things is one of the hardest problems in computer science. The tasks might be allocated statically (all tasks known at compile time) or it might be possible to dynamically add and remove tasks. I've seen both work well. But, recently I've been wondering ... how exactly do they work? And I thought the best way to figure that out is to try and write one and see what problems I have to solve. But which platform should we target with our RTOS (or, which one should we target first)? Arm Cortex-M The Arm Cortex-M line of processors, implementing the M-Profile variants of the Arm Architecture, are a good choice here. Cheap, very widely available, and well supported by Rust. And it turns out they have a magic super-power - they are designed to make an RTOS very easy to write. Armv6-M, Armv7-M, Armv7E-M, Armv8-M Baseline and Armv8-M Mainline all operate in pretty much the same way, so let's just pick Armv7-M as our example. A task, in Armv7-M terms, consists of the processor state available to an executing program - the CPU registers R0 through R12, the Link Register (LR), the Stack Pointer (SP), the Program Counter (PC) , and the Processor Status Register (CPSR). If we stop a task from running, by handling an Exception, we simply need to save all of that 'stuff' somewhere - usually the task's stack - and then put all the 'stuff' back later. The program will then have no idea it had been interrupted, moved into a storage area, and later resurrected. All it will be able to observe is that the wall clock suddenly jumped forwards. By Exception we mean a Supervisor Call, one of a number of Faults, or an Interrupt - they are all handled the same way. When an Exception occurs, the processor hardware, following the specification laid down in the Armv7-M architecture, will start executing the exception handler code. But if it just did that and nothing else, any registers the exception handler used would need to be saved somewhere and then restored back to their original values when the exception handler ended. If it did not do this, then a side-effect of an exception firing would be that registers would randomly change their contents whilst your program executed. This would be a nightmare, and nothing would work right. However, writing a routine that can push all the registers to the stack would need to be written in Assembly Language and it would need to be very careful to not use a register until after it had been saved. If we wrote our exception handler in C, or Rust, (typically one compatible with the Arm Embedded ABI), the compiler will emit machine code that freely uses R0, R1, R2, R3 and R12. The assumption being that if you call a function and you care about the values in those registers, then the caller should save them, leaving the callee (the function being called) free to just use them as it likes. Indeed, I believe they are also used to pass arguments to the callee, and for passing back the return value to the caller. So that is a problem - the compiled code wants to use some of the registers, and we cannot write compiled code to save those registers because the code will damage the registers it is trying to save. However, in a major departure from both Legacy Arm Architectures and the R- and A-Profiles of the current Arm Architectures, the M-Profile specifications state that the hardware will automatically push those registers to the stack. This means that you can write Armv7-M (et al) exception handlers in C. Or in Rust. It also gives us a leg-up when we want to write an RTOS, as part of our work is done for us. // An exception handler written in C void SysTick ( void ) { my_rtos_tick_handler (); GLOBAL_COUNTER += 1 ; } Having had to write Interrupt Handlers in assembly language for Arm R-Profile architectures, I appreciate this a lot. It should be noted that exceptions can pre-empt each other. That is, if you are running Interrupt L, and some Interrupt H becomes ready, where H is at some higher priority than L (indicated by H having a lower priority number than L), then the state of the handler for Interrupt L will be pushed to the stack and the handler for Interrupt H will be started. We don't generally want to block interrupts from running, because that impacts our response time to external events. Faults obviously have the highest priority - they typically mean it's game over and time to reboot. Now, to change tasks when some task has been running for too long, we need some periodic timer tick - an interrupt that fires every N clock cycles, where we can select the value for N. If we tick more often, we waste more time in the timer interrupt handler, but if we tick less often, we get a lower resolution 'clock'. Setting N such that the interrupt fires 100 times per second is fairly common. You could also be smart and set the timer to only fire when we actually know we have something to do (a so-called 'tickless' RTOS), but let's not worry about that fow now. Helpfully Arm mandate that every Armv7-M processor comes with a standard timer called SysTick, which is literally designed to do exactly what we want. We program a few registers, and the SysTick exception will fire every N clock cycles, just as we want. And we could use this to run our task-switching code, but ... what if we want to switch tasks in-between ticks? Perhaps because a task ran out of work to do, and now needs to wait for some event? It's easy to manually force the SysTick exception to run but, if we did, how would we know if it ran because the timer went off, or because we manually provoked it? We also have an issue with priorities - we might want the SysTick handler to be a fairly high priority so that our clock doesn't suffer from jitter, but need all our other interrupts to take priority over task-switching. If the task-switcher pre-empted an Exception handler rather than user code, it would stack the Exception handler state onto the 'current' (but dormant) task's stack, as opposed to the task state - and that would be a disaster. Note: If you'd like a deeper dive into Arm Cortex-M Exception handling, memfault has you covered. So if we don't task switch in the SysTick handler where should we task switch? Arm provide a special exception called PendSV , which is ideal for our needs. It can be set as the lowest priority exception handler, and it can be very easily triggered by writing a bit to a special register. We can even set this bit from another exception handler (like the SysTick handler), and the PendSV handler won't run until all other exceptions are complete. So, if the SysTick handler fires and we decide to force a task switch, or if a task wants to pause itself and let something else run, either way we can work out which task to run next, set the PendSV bit, and sit back whilst the PendSV handler does the switch. The final piece of the Armv7-M feature set which helps us write an RTOS is the fact there are two stack pointers - a Main Stack Pointer (MSP) and a Process Stack Pointer (PSP). By default, the system runs on the MSP, for both the main function and any exception handlers that run. However, you can flip a bit in a register, and switch the main function to use the PSP instead (leaving exception handlers using the MSP). Obviously, we should set the PSP to some suitable value first, but this does mean that our PendSV handler, and all the other exception handlers, can have one stack to share, and then every task can have its own unique stack. We just need to arrange for the PSP to be selected when we leave PendSV , as opposed to the MSP. And it turns out, Arm thought of that too. When you enter an exception handler, the values of PC, LR, R0, R1, R2, R3 and R12 from the interrupted code are automatically saved to the stack using the MSP. When you leave an exception handler, those values will be restored. The hardware takes advantage of this by leaving the exception handler a little message in the LR register - basically telling us what kind of code was interrupted. Was it in privileged or unprivileged mode? Was it using the FPU? Was it using MSP or PSP? And by leaving the same kind of message in LR when we exit the handler, the processor will reconfigure itself to the appropriate state before resuming execution of the code. However, there's nothing to say the value of LR we get on entry to PendSV needs to be the one we return! If we're switching from Task A to Task B, we should save the value of LR from Task A somewhere (to the Task A's Stack using the PSP, along with R4 to R11), and then we should return the value of LR that Task B had when it was interrupted. Writing an RTOS in Rust OK, so now we need to: Set up a stack for each task Push some initial state onto each stack Use PendSV to switch from 'No Task' to running our first task Regularly interrupt the processor to: Pick the next task to run Use PendSV to switch tasks As a bonus, we should also keep track of which tasks are ready to run Let's take these in turn. Representing a Stack On Arm, stacks are typically what is known as Full Descending. That is, the stack pointer points at the most recent item added to the stack (it points to a 'full' location), and when a new item is pushed onto the stack, the pointer moves downwards. You could also have a Empty Descending stack, or even an Empty Ascending stack, but C and Rust agree to use the Arm Embedded ABI, and that says that stacks are Full Descending (and you can imagine the mess if not all the code in the system agrees on this convention). So, we need a memory region, and we need to know the address just above the top-most location in that region. First, please note that examples on this page are Copyright (c) 2025 Ferrous Systems, and licensed under GPL-3.0-or-later. You can find a full working version at https://github.com/jonathanpallant/pets/tree/v0.1.0, which includes all the lovely comments that I stripped out of this blog page in the name of brevity (and because there's all this explanatory text anyway). use crate ::UnsafeCell; #[ repr ( align (8))] pub struct Stack< const LEN: usize > { contents : UnsafeCell<[ u8 ; LEN ]>, } impl < const LEN: usize > Stack { pub const fn new () -> Self { assert!( LEN . is_multiple_of ( 4 )); Self { contents: UnsafeCell::new([ 0 u8 ; LEN ]), } } pub const fn top (& self ) -> *mut u32 { unsafe { self .contents. get (). add ( 1 ) as *mut u32 } } } unsafe impl < const LEN: usize > Sync for Stack {} impl < const LEN: usize > Default for Stack { fn default () -> Self { Stack::new() } } So we have a struct Stack managing an UnsafeCell containing an array of bytes, and a method to get a pointer to the 'top' of the stack (which is just beyond the array). It's not perfect - there's nothing stopping someone using the same stack twice, but it'll work. And we do at least ensure the stack size of a multiple of 4, and that it starts on an address that is a multiple of 8 (the AAPCS specification says compilers can rely on this being true, and we have observed Rust code having undefined behaviour when this is not true). Pushing onto a Stack When we set up our tasks, we need to push some information into the stack for each task. We could do this by applying negative offsets to the stack pointer, but this seems error prone. So let's have a little helper that can push a value into the stack, and move the stack pointer downwards automatically. pub ( crate ) struct StackPusher( *mut u32 ); impl StackPusher { pub ( crate ) unsafe fn new ( stack_top : *mut u32 ) -> StackPusher { StackPusher(stack_top) } pub ( crate ) fn push (& mut self , value : u32 ) { unsafe { self . 0 = self . 0. offset (- 1 ); self . 0. write_volatile (value); } } pub ( crate ) fn current (& self ) -> *mut u32 { self . 0 } } Remember, some comments have been removed to save space - I'm not actually a monster who writes undocumented code. Tasks Now, we need something to represent our tasks: pub type TaskEntryFn = fn () -> !; #[ repr (C)] pub struct Task { stack : AtomicPtr< u32 >, entry_fn : TaskEntryFn, } All we need to know is, what code should the task execute when it is started, and, what is the tasks current stack pointer (i.e. the value of PSP last time we suspended the task). Let's add some helper methods to it: impl Task { /// The size of a task object is `pow(2, SIZE_BITS)`. pub const SIZE_BITS : usize = 3 ; /// A compile-time check that the size of a [`Task`] is what we said it was. const _CHECK : () = const { assert!(core::mem::size_of::< Self >() == ( 1 << Self :: SIZE_BITS )); }; pub const fn new < const N: usize >( entry_fn : TaskEntryFn, stack : &Stack) -> Task { assert!(N > crate ::Scheduler:: MIN_STACK_SIZE ); Task { entry_fn, stack: AtomicPtr::new(stack. top ()), } } pub ( crate ) const fn entry_fn (& self ) -> TaskEntryFn { self .entry_fn } pub ( crate ) fn stack (& self ) -> *mut u32 { self .stack. load (Ordering::Relaxed) } pub ( crate ) unsafe fn set_stack (& self , new_stack : *mut u32 ) { self .stack. store (new_stack, Ordering::Relaxed) } } Later on we're going to be poking values into this struct using assembly language so it's important that the size of struct Task is a power of 2 (which means we can convert a task index into a byte offset by doing a left-shift, instead of having to do a multiply). So we have a compile-time assert to check our Task::SIZE_BITS value is correct. The Scheduler Now, our scheduler. We're going to need to hold on to a static list of Task values. We don't want to own them, because then we'd need to be generic over the length of the list - having a reference to them is fine. We need to track which task ID we are currently running (if any), and which one we should switch to next. We should probably also keep track of time. #[ repr (C)] pub struct Scheduler { /// Which task is currently running current_task : AtomicUsize, /// Which task should PendSV switch to next next_task : AtomicUsize, /// A fixed, static list of all our tasks task_list : & 'static [Task], /// Current tick count ticks : AtomicU32, } Our assembly code is going to need to poke at this data, so let's store a pointer to our one global Scheduler, and give ourselves some constants to make it easier to access the fields. pub ( crate ) static SCHEDULER_PTR : AtomicPtr = AtomicPtr::new(core::ptr::null_mut()); impl Scheduler { pub ( crate ) const CURRENT_TASK_OFFSET : usize = core::mem::offset_of!(Scheduler, current_task); pub ( crate ) const NEXT_TASK_OFFSET : usize = core::mem::offset_of!(Scheduler, next_task); pub ( crate ) const TASK_LIST_OFFSET : usize = core::mem::offset_of!(Scheduler, task_list); } That's better than hard-coding constants that end up wrong when we change the struct definition. Now we need to be able to create a Scheduler. Let's do that as a const-fn, so we have a store the Scheduler in a static variable. impl Scheduler { /// Build the scheduler pub const fn new ( task_list : & 'static [Task]) -> Scheduler { // Cannot schedule without at least one task assert!(!task_list. is_empty ()); Scheduler { task_list, current_task: AtomicUsize::new( usize :: MAX ), next_task: AtomicUsize::new( 0 ), ticks: AtomicU32::new( 0 ), } } } Having nothing to do would be bad, so we fail if we don't get at least one task to run. Now, to start, we need to push some data into each task's stack. Then we can hit the PendSV button to switch to the first task, and we're off! Note the extreme comment/code ratio - this stuff is almost all load-bearing. impl Scheduler { /// Run the scheduler /// /// You may only call this once, and you should call it from `fn main()` /// once all your hardware is configured. We should be in Privileged /// Thread mode on the Main stack. pub fn start (& self , mut syst : cortex_m::peripheral::SYST, systicks_per_sched_tick : u32 ) -> ! { if self .current_task. load (Ordering::SeqCst) != usize :: MAX { panic!(" Tried to re-start scheduler! "); } // remember where this object is - it cannot move because we do not exit this function let self_addr = self as *const Scheduler as *mut Scheduler; SCHEDULER_PTR . store (self_addr, Ordering::Release); // Must do this /after/ setting SCHEDULER_PTR because the SysTick // exception handler will use SCHEDULER_PTR syst. set_reload (systicks_per_sched_tick); syst. clear_current (); syst. enable_counter (); syst. enable_interrupt (); // We need to push some empty state into each task stack for (task_idx, task) in self .task_list. iter (). enumerate () { let old_stack_top = task. stack (); // SAFETY: The task constructor does not let us make tasks with // stacks that are too small. let mut stack_pusher = unsafe { StackPusher::new(old_stack_top) }; // Standard Arm exception frame // CPSR stack_pusher. push ( Self :: DEFAULT_CPSR ); // PC stack_pusher. push (task. entry_fn () as usize as u32 ); // LR stack_pusher. push ( 0 ); // R12 stack_pusher. push ( 0 ); // R3 stack_pusher. push ( 0 ); // R2 stack_pusher. push ( 0 ); // R1 stack_pusher. push ( 0 ); // R0 stack_pusher. push ( 0 ); // Additional task state we persist // R11 stack_pusher. push ( 0 ); // R10 stack_pusher. push ( 0 ); // R9 stack_pusher. push ( 0 ); // R8 stack_pusher. push ( 0 ); // R7 stack_pusher. push ( 0 ); // R6 stack_pusher. push ( 0 ); // R5 stack_pusher. push ( 0 ); // R4 stack_pusher. push ( 0 ); // Set task stack pointer to the last thing we pushed // SAFETY: the pointer we are passing is a validly aligned stack pointer unsafe { task. set_stack (stack_pusher. current ()); } } // Fire the PendSV exception - the PendSV handler will select a task // to run and run it cortex_m::peripheral:: SCB ::set_pendsv(); // flush the pipeline to ensure the PendSV fires before we reach the end of this function cortex_m::asm::isb(); // impossible to get here unreachable!(); } } This function never returns - it will be the last thing that fn main() calls. The DEFAULT_CPSR is 0x0100_0000 - the bit we have set is the 'Thumb' bit, which indicates the processor is executing the T32 ISA instead of the A32 ISA. As this is the only ISA supported in M-Profile Architectures, if we do not set this bit the processor will crash when we resume our first task. It took me a while to work that one out. Not also that the order we push the saved state into each stack is important - it must be the reverse of the order in the PendSV handler (first) and processor itself (second) takes them out. There's also a curious issue with set_pendsv() . Setting the bit doesn't immediately cause the PendSV handler to fire. Because Arm processors are pipelined, they are loading the next instruction whilst simultaneously executing the current instruction (and perhaps retiring the previous instruction). So there may be a delay of a clock cycle or two whilst the processor deals any instructions it started but has not finished, before it jumps to the PendSV handler. An Instruction Synchronization Barrier is what we want here, to block the CPU until the pipeline is empty, using the isb() function from the cortex-m crate. Picking a new task To pick new tasks to run, we need handle two cases: Timer fired and task is suspended whether it likes it or not Task is bored and wishes to be suspended impl Scheduler { /// Call periodically, to get the scheduler to adjust which task should run next /// /// This is currently a round-robin with no priorities, and no sense of tasks being blocked /// /// Ideally call this from a SysTick handler pub fn sched_tick (& self ) { defmt::debug!(" Tick! "); self .ticks. fetch_add ( 1 , Ordering::Relaxed); self . pick_next_task (); cortex_m::peripheral:: SCB ::set_pendsv(); } /// Switch tasks, because this one has nothing to do right now pub fn yield_current_task (& self ) { self . pick_next_task (); cortex_m::peripheral:: SCB ::set_pendsv(); } /// Select the next task in the round-robin /// /// Updates `self.next_task` but doesn't trigger a task switch. Set PendSV /// to do that. fn pick_next_task (& self ) { cortex_m::interrupt::free(| _cs | { let next_task = self .next_task. load (Ordering::Relaxed); let maybe_next_task = next_task + 1 ; let new_next_task = if maybe_next_task >= self .task_list. len () { 0 } else { maybe_next_task }; self .next_task. store (new_next_task, Ordering::Relaxed); }); } } I don't know that I've got those atomic orderings correct here. If you're Mara, feel free to tell me I'm wrong and how to make it better! Actually switching tasks OK, enough already. We need to write the PendSV handler and the bad news is, we cannot do it in Rust. It needs to access system registers, and so we cannot let the compiler generate any code that might affect those registers (like, copy them to the Main Stack) before we've had a chance to read them. But at least we have naked functions in Rust now, so we don't have to resort to using Assembly files, or declaring our own function symbols using `global_asm!. #[ unsafe (no_mangle)] #[ unsafe (naked)] unsafe extern " C " fn PendSV () { // NOTE: This code must NOT touch r4-r11. It can ONLY touch r0-r3 and r12, // because those registers were stacked by the hardare on exception entry. naked_asm!( r #" // r1 = the address of the Scheduler object ldr r1, ={scheduler_ptr} ldr r1, [r1] // r2 = the current task ID ldr r2, [r1, {current_task_offset}] // r3 = the task list pointer ldr r3, [r1, {task_list_offset}] // if current task ID is -1, skip the stacking of the current task cmp r2, #-1 beq 1f // // Stack the current task // // r1 holds the scheduler object's address // r2 holds the current task ID // r3 holds the task list's address // // r2 = the current task byte offset lsl r2, {task_size_bits} // r0 = the current task stack pointer mrs r0, psp // Push the additional state into stack at r0 stmfd r0!, {{ r4 - r11 }} // save the stack pointer (in r0) to the task object str r0, [r3, r2] // // Pop the next task // // r1 holds the scheduler object's address // r3 holds the task list's address // 1: // r2 = the next task byte offset ldr r2, [r1, {next_task_offset}] lsl r2, {task_size_bits} // r0 = the stack pointer from the task object ldr r0, [r3, r2] // Pop the additional state from it ldmfd r0!, {{ r4 - r11 }} // Set the current task stack pointer msr psp, r0 // // Update the Current Task ID // // r1 holds the scheduler object's address // // copy the next task id to the current task id ldr r2, [r1, {next_task_offset}] str r2, [r1, {current_task_offset}] // // return to thread mode on the process stack // // This is the magic LR value for 'return to thread mode process stack' mov lr, #0xFFFFFFFD bx lr "#, scheduler_ptr = sym scheduler:: SCHEDULER_PTR , current_task_offset = const Scheduler:: CURRENT_TASK_OFFSET , next_task_offset = const Scheduler:: NEXT_TASK_OFFSET , task_list_offset = const Scheduler:: TASK_LIST_OFFSET , task_size_bits = const Task:: SIZE_BITS , ); } Getting this right was ... challenging - hence all the comments for my future self! Luckily we have QEMU, so running a test was very fast, and I was able to single-step through the code in GDB to check it was doing what I wanted (when GDB wasn't busy segfaulting, that is). But the gist of it is, we can rely on the hardware having pushed a basic frame to the Process Stack, and we only need to push the remaining registers (R4 to R11). We can then pop those same registers from the next task's stack, change PSP, and then leave the exception return mechanism to drop us back into our freshly resumed task. We also rely on some of the constants we exported earlier so we can reach into the global Scheduler object to read the next task ID, the current task ID (which we update when we've switched tasks), and the list of Task objects themselves. I do also have an updated version of this code which also handles lazy FPU stacking and extended frames, but that's a bit much for this post. Maybe next time. Userspace API Our tasks are going to need to interact with our scheduler, so let's give them some simple functions to call: /// Delay a task for at least the given period, measured in timer ticks. /// /// Calling `delay(0)` is basically just a yield. pub fn delay ( ticks : u32 ) { let scheduler = Scheduler::get_scheduler(). unwrap (); let start = scheduler. now (); loop { // yield first, so delay(0) does at least one task switch scheduler. yield_current_task (); // is it time to leave? let delta = scheduler. now (). wrapping_sub (start); if delta >= ticks { break ; } } } /// Get the current time, in ticks pub fn now () -> u32 { if let Some(scheduler) = Scheduler::get_scheduler() { scheduler. now () } else { 0 } } Oh, we also need that handy get_scheduler() function to grab our global static scheduler. Plus the now() method. impl Scheduler { /// Get the handler to the global scheduler pub ( crate ) fn get_scheduler () -> Option<& 'static Scheduler> { // Get our stashed pointer let scheduler_ptr = SCHEDULER_PTR . load (Ordering::Relaxed); // Are we intialised? if scheduler_ptr. is_null () { None } else { // SAFETY: Only [`Scheduler::start`] writes to [`SCHEDULER_PTR`] and it // always sets it to be a valid pointer to a [`Scheduler`] that does not // move. Some( unsafe { &*scheduler_ptr }) } } /// Get current tick count pub fn now (& self ) -> u32 { self .ticks. load (Ordering::Relaxed) } } An example But, let's leave you with an example program running in PETS - our new pre-emptive, time-slicing scheduler. #![ no_std ] #![ no_main ] use pets::{Scheduler, Stack, Task}; use defmt_semihosting as _; const SYSTICKS_PER_SCHED_TICK : u32 = 100_000 ; static SCHEDULER : Scheduler = Scheduler::new({ static TASK_LIST : [Task; 3 ] = [ Task::new(rabbits, { static STACK : Stack<1024> = Stack::new(); & STACK }), Task::new(hamsters, { static STACK : Stack<1024> = Stack::new(); & STACK }), Task::new(cats, { static STACK : Stack<1024> = Stack::new(); & STACK }), ]; & TASK_LIST }); #[ cortex_m_rt :: entry ] fn main () -> ! { let cp = cortex_m::Peripherals::take(). unwrap (); defmt::info!(" Hello! "); SCHEDULER . start (cp. SYST , SYSTICKS_PER_SCHED_TICK ); } fn rabbits () -> ! { loop { defmt::info!(" Rabbit! (back in 5) "); pets::delay( 5 ); } } fn hamsters () -> ! { loop { defmt::info!(" Hamster! (back in 10) "); pets::delay( 10 ); } } fn cats () -> ! { loop { defmt::info!(" Cat! (back in 3) "); pets::delay( 3 ); } } #[ panic_handler ] fn panic ( info : &core::panic::PanicInfo) -> ! { defmt::println!(" PANIC: {} ", defmt::Debug2Format(info)); cortex_m::asm::udf(); } #[ cortex_m_rt :: exception ] unsafe fn HardFault ( info : &cortex_m_rt::ExceptionFrame) -> ! { defmt::println!(" FAULT: {} ", defmt::Debug2Format(info)); cortex_m::asm::udf(); } defmt::timestamp!(" {=u32:010} ", pets::now()); $ cargo build --bin example1 Compiling pets v0.1.0 (/home/jonathan/Documents/pets) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.67s $ qemu-system-arm \ -cpu cortex-m4 -machine mps2-an386 \ -semihosting-config enable=on,target=native \ -nographic \ -kernel target/thumbv7em-none-eabi/debug/example1 \ | defmt-print \ -e target/thumbv7em-none-eabi/debug/example1 \ --log-format="{t} {[{L}]%bold} {s} {({ff}:{l:1})%dimmed}" 0000000000 [INFO ] SCHEDULER_PTR @ 20000c40 (src/scheduler.rs:78) 0000000000 [INFO ] Scheduler @ 20000018 (src/scheduler.rs:83) 0000000000 [INFO ] Init task frame 0, with stack @ 0x20000430 (src/scheduler.rs:96) 0000000000 [INFO ] Init task frame 1, with stack @ 0x20000830 (src/scheduler.rs:96) 0000000000 [INFO ] Init task frame 2, with stack @ 0x20000c30 (src/scheduler.rs:96) 0000000000 [INFO ] Rabbit! (back in 5) (bin/example1.rs:38) 0000000000 [INFO ] Hamster! (back in 10) (bin/example1.rs:48) 0000000000 [INFO ] Cat! (back in 3) (bin/example1.rs:58) 0000000003 [INFO ] Cat! (back in 3) (bin/example1.rs:58) 0000000005 [INFO ] Rabbit! (back in 5) (bin/example1.rs:38) 0000000006 [INFO ] Cat! (back in 3) (bin/example1.rs:58) 0000000009 [INFO ] Cat! (back in 3) (bin/example1.rs:58) 0000000010 [INFO ] Rabbit! (back in 5) (bin/example1.rs:38) 0000000010 [INFO ] Hamster! (back in 10) (bin/example1.rs:48) 0000000012 [INFO ] Cat! (back in 3) (bin/example1.rs:58) ... Look at that! Our tasks are merrily switching themselves. We have an RTOS. Not a very good RTOS of course, and possibly not very real-time. To make it better, we should give each task a Priority, and select the task with the highest priority to run. We also need a mechanism to block a task on something other than the current time - say, to wait for something to arrive in a mailbox. I'll leave these as exercises for the reader for now, along with FPU support (because you'll note we haven't saved any of the 33 FPU registers when switching tasks). Conclusions Well, all told, we needed 300 lines of Rust and Assembly code, excluding the examples, to write a pre-emptive task switching scheduler. I think that's not bad at all, especially as the less code you write, the less you have to test and verify. We've seen how Armv7-M's SysTick, PendSV and PSP functions are literally tailor-made for writing an RTOS. And I don't think we missed C at any point? Very few lifetime or ownership and borrowing issues to worry about here. As a wise man once said, "Why not try it yourself?"

Pre-Emptive Multi-Tasking on Arm Cortex-M

Share this article

Related Articles