Multiplayer has been the single most requested feature for Teardown ever since before its initial release. Synchronizing physics over the network is already known to be hard, and on top of that we have a completely dynamic, destructible world with full modding support. For a long time, we considered the whole idea unrealistic.
Despite the scepticism, we did an internal experiment back in 2021, using a naive approach to synchronize moving objects and send altered voxel data as objects were destroyed. It used enormous amounts of bandwidth and completely choked the connection when large objects were destroyed. It was purely a learning project and never reached a usable state, but it taught us where the bottlenecks were.
Around the same time, a community project called TDMP added rudimentary multiplayer support through reverse engineering and DLL injection. Despite being a bit janky, it completely blew my mind. It was an incredible technical achievement by the people involved. The mod mostly synchronized player position and player input, and since the engine isn’t deterministic, it could easily get out of sync, especially with destruction.
A semi-deterministic approach
As we started bringing more people on board, we did a more serious investigation into a proper multiplayer implementation in late 2022. We knew we wanted perfect world sync. Anything else would quickly make simulations diverge in the chaotic world of Teardown. Sending large amounts of voxel data wasn’t an option because of bandwidth, so we had to rely on determinism. Early on, I dismissed the idea of full determinism for the entire engine (a view I have since reevaluated), so it needed to be a hybrid approach: destruction done deterministically, while most other things use state synchronization.
For the longest time (and for good reasons), floating point operations were considered unsafe for deterministic purposes. That is still true to some extent, but the picture is more nuanced than that. I have since learned a lot about floating point determinism, and these days I know it is mostly safe if you know how to navigate around the pitfalls. I won’t cover them all here, but I hope to do that in another post, because there’s a lot of confusion around this topic.
At the time, I decided to rewrite the destruction logic in fixed-point integer math, which is fairly straightforward given that we’re dealing with discrete voxel volumes. But there’s much more to destruction logic than cutting out voxels on a regular grid. Object hierarchies may separate, new objects can be created and joints can be affected or reattached. A lot of this still involves floating point math, so each breakage event is split into a stream of deterministic commands that are replicated on all clients: “cut hole in this shape at voxel coord x,y,z”, “change ownership of that shape”, “reconnect joint to this shape”, etc.
Our implementation does not use dedicated servers. The player hosting a game also acts as server for that session, so all mentions about the server below is really just the player who hosts the session.
Reliable and unreliable
As long as the deterministic commands are applied to the world in exactly the same way, in exacly the same order, the resulting changes will be identical across all machines. The bandwidth requirements are small because commands are the same regardless of object size. Anything that modifies the scene content, such as spawning new objects or recoloring objects, is implemented using the same approach. We put all these commands on a reliable network stream, where everything is guaranteed to arrive in order and nothing is missed, just like a traditional data stream.
... continue reading