Skip to content
Tech News
← Back to articles

A most elegant TCP hole punching algorithm

read original more articles
Why This Matters

This article introduces an elegant TCP hole punching algorithm that simplifies NAT traversal by using deterministic calculations based on timestamps, eliminating the need for complex infrastructure. This approach enhances the reliability and testability of peer-to-peer connections behind NATs, which is crucial for real-time communication applications. It offers a streamlined method for developers to verify hole punching techniques without extensive setup.

Key Takeaways

A most elegant TCP hole punching algorithm

TCP hole punching is a way to connect two computers behind NAT routers. The technique has a lot of requirements to work.

Both sides must know each other’s WAN IPs

They must know the right external ports to use

They must connect at exactly the same time

In practice this collapses to using STUN to lookup the WAN IP, doing NAT type enumeration and port prediction, synchronizing time with NTP, and having both sides exchange all the needed meta data (WAN IP, port predictions, future NTP punch time) via some “channel.”

That involves a whole list of infrastructure and code to work – which is complex and error-prone. What if you just wanted to test whether your punching algorithm works? You don’t care about every other part of the software. This is a way to do that.

Part 1: selecting the bucket A simple way to bypass the need for fixed infrastructure in hole punching algorithms is by using a deterministic algorithm to derive all meta data from a single parameter. Here’s how it works. First, a starting parameter is chosen based on the unix timestamp. What we need is a number both sides can converge on without requiring any communication. However, as we know in distributed systems theory – there is no “now” in networks. So a little math is used to make the “now.” now = timestamp () max_clock_error = 20 s # How far off the timestamp can be min_run_window = 10 s # Total time window to run the program for both sides window = ( max_clock_error * 2 ) + 2 bucket = int (( now - max_clock_error ) // window ) We define what an accepted solution looks like. There ought to be a certain amount of time that the protocol can run in. A certain range that a timestamp must fall in to be considered valid. This information is combined and then quantized in a way that both sides converge on the same number even given variations to clock skew. This we term the “bucket.”

Part 2: selecting ports Now that both sides share the same bucket we can use this to derive a list of shared ports. The idea behind the port list is that the local port will equal the external port. Many home routers try to preserve the source port in external mappings. This is a property called “equal delta mapping” – it won’t work on all routers but for our algorithm we’re sacrificing coverage for simplicity. In order to generate a shared list of ports without communication between the two sides – the bucket is used as a seed to a pseudo-random number generator. This allows for a much smoother conversion of the bucket number which might otherwise have little direct variation. large_prime = 2654435761 stable_boundary = ( bucket * large_prime ) % 0xFFFFFFFF The FF… number fixes the range for the boundary number so it doesn’t overflow what the PRNG accepts as a seed. A prime number is used for the multiplier because if the multiplier shares a common factor with the modulus (0xff..) then the space of possible numbers shrink due to shared factors. A prime number ensures that the number space contains unique items. The port range is now computed based on using a randrange func. In my code I generate 16 ports and throw out any I can’t bind to. It is expected that there will be some port collisions with existing programs in the OS when manually selecting random ports like this.

Part 3: sockets and networking It is helpful to review the requirements for TCP hole punching before we proceed. There are very specific socket options that must be set for it to work. s . setsockopt ( socket . SOL_SOCKET , socket . SO_REUSEADDR , 1 ) s . setsockopt ( socket . SOL_SOCKET , socket . SO_REUSEPORT , 1 ) TCP hole punching involves aggressively reusing socket addresses. Normally with TCP, if a connection fails you close the socket. But with TCP hole punching if you do any kind of cleanup you break the protocol. Cleanup means calling close. And close means possibly sending out an RST packet that tells the remote router “bad connection, disregard.” If you close the socket the OS will also start its own cleanup process. The socket will enter states like TIME_WAIT and will be harder to reliably reuse the same address even if you do set the right socket options. I also want to add that the only really correct model for TCP hole punching is to use non-blocking sockets. You can’t use blocking sockets because you need to be able to rapidly send SYN packets without having to wait for a response. Async networking won’t work here either. Async networking prevents your software from being able to precisely control timing and TCP hole punching is highly sensitive to timing. If packet exchange is off by even a few miniseconds the whole protocol can fail. In my opinion: the best way to implement this is to use non-blocking sockets with select for polling. This allows you to properly handle every connection state without impacting timing. for ... sel . register ( s , selectors . EVENT_WRITE | selectors . EVENT_READ ) For punching: we simply call connect_ex on a (dest_ip, port) tuple with a 0.01 second sleep until an expiry (src_port == dest_port.) Actual punching here is not terribly elegant as you’re just spamming out SYNs. But this part needs to be aggressive enough to create a remote mapping but not so aggressive that it kills your CPU. I use a 0.01 second sleep for that.

... continue reading