Tech News
← Back to articles

Facebook's Fascination with My Robots.txt

read original related products more articles

For the past 4 days — and probably more since I don't have logs beyond that — Facebook has been hitting the /robots.txt of my self-hosted Forgejo instance several times per second. The user-agent is facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) . I expected the UA header to be nontrustworthy, but all the requests are also coming from Meta's IP address ranges.

The interesting thing is that no other file is being accessed. Just robots.txt over and over and over again.

Facebook's documentation states:

The primary purpose of FacebookExternalHit is to crawl the content of an app or website that was shared on one of Meta’s family of apps, such as Facebook, Instagram, or Messenger. The link might have been shared by copying and pasting or by using the Facebook social plugin. This crawler gathers, caches, and displays information about the app or website such as its title, description, and thumbnail image.

Now, as tempting as it is to think that I've suddenly reached unfathomable levels of popularity on Meta's platforms, I find it difficult to believe as the only other traffic on my instance are the AI bots consistently crawling the qmk_firmware repository and the very occasional user of one of my Hex packages. And myself. Not even Facebook themselves are requesting any other path at the moment, just robots.txt .

Here's the accesses I'm getting, visualised in two ways for your convenience:

This chart provided by my extreme LibreOffice Calc skillz. Data is grouped by hour. Click the image to open in full size.

[Insert Matrix quote here.]

So what's going on at Meta? Why are they so obsessed with my very bog standard robots.txt file? I'm a nobody and surely not interesting enough that they'd only be targeting me specifically, so how much bandwidth and energy are they using globally to mass request robots.txt files in a never ending loop? Perhaps someone at their end screwed up a loop conditional, but you'd think some monitoring dashboard somewhere would have a warning pop up because of this.

Anyway, compared to the earlier AI bot onslaught, this traffic is mostly benign for myself, just interesting. As long as it doesn't continue picking up speed.