Async and Finaliser Deadlocks

Two days ago I was listening to the Oxide podcast on futurelocks, a very complicated bug involving async code in Rust. I must admit that I struggled to understand what was going on, partly because of the subject matter, and partly because podcasts are my backdrop to household chores . At some point towards the end, though, someone phrased things in a way that my pea brain could immediately understand.

In essence, I think futurelocks are a complex instance of a long-standing problem with (what I will call for now) asynchronous code. You may notice that I did not say a “well known” problem. Personally, I only realised this problem exists a couple of years ago. For better or worse, subsequent discussions with many other folk have convinced me that my lack of awareness is common.

Deadlocking finalisers

Helpfully, Oxide have a thoughtful writeup of the futurelock problem. However, even though I like to flatter myself that I’m a competent Rust programmer, I had to work hard to understand precisely what’s going on.

Fortunately we can create a simplified version of the underlying problem in Python:

import threading mutex = threading. Lock () class T : def __del__ ( self ): print (" acquiring ") mutex. acquire () print (" acquired ") mutex. release () t = T () mutex. acquire () t = None mutex. release ()

In essence, this code is modelling a classic programming need. A mutex (colloquially a “lock”) guards a shared resource (e.g. a network socket, counter, etc.). When the garbage collector determines that an object is no longer used, it runs its finaliser i.e. its __del__ method. In this case the finaliser acquires the mutex (i.e. locks it), allowing it do something with the shared resource the mutex is guarding, and then releases the mutex (i.e. unlocks it).

Unfortunately, when I run this code in CPython, it hangs at the terminal having written just:

$ python3 t.py acquiring

What’s going on? Why hasn’t it printed acquired ?

... continue reading