In this article I want to document my journey implementing fast TCP fingerprinting in a golang webserver, using eBPF.
Just to provide some background, TCP fingerprinting is one of the many techniques that can be used to detect unusual or identifying informations about a web request when implementing an anti-bot solution.
This has been a hot topic lately, caused by the rising need to scrape the internet for human content to feeed to the LLMs.
Implementing such a system offers interesting technical challenges that span different domains, but most importantly it’s a very good first project to learn eBPF.
I split this article in two parts.
In this first part I provide a background on TCP fingerprinting, and discuss some implementation strategies.
In the second part I describe the actual development of a proof-of-concept Golang webserver that echoes back the TCP fingerprint of the client. The project is open source on Github
HTTP requests, from first principles
It can be useful to approach this problem from first principles, looking at the way web servers work under the hood.
This is not as scary as it might seem: although browsers and the underlying protocols evolved and got more complex over time, for compatibility reasons they still support the HTTP/1.0 protocol, which was designed to be simple. An HTTP/1.0 web server is just a light layer on top of a TCP connection, and can be implemented in just a few lines of C code.
... continue reading