Blocking LLMs from your website cuts you off from next-generation search

Why blocking LLMs from your website is dumb John Wang 2 min read · 1 hour ago 1 hour ago -- Listen Share

Perplexity was recently accused of scraping sites that had explicitly disallowed LLM crawlers in their robots.txt files. In the wake of that revelation, a wave of how-to guides for blocking large-language-model scraping has surfaced [0]. They’re generally highly vitriolic, with people opposing this on both moral grounds (“AI is stealing your content”) as well as displaying a general distaste for AI.

But how many of you wouldn’t hook up your website to Google?

I know one of the primary reasons that I do anything online is to provide an outlet for someone else to see it. If I didn’t want someone else to see it, I’d write it down on my notebook, not on the public web.

LLMs are the next generation’s search layer. They’re already generating massive amounts of pipeline for the companies and websites that have gotten good at getting their content displayed in LLMs. Combine that with the fact that most LLMs have an agentic web-search component that will actively generate links, and you have a massive funnel of potential readers for your content.

Blocking that pipeline may feel righteous, but it also cuts you off from the fastest-growing distribution channel on the web.

Just like any technology, using LLMs correctly harnesses a ton of power, and completely trying to block the technology is generally a bad idea. I think the upside goes to creators who adapt, not those who hide.

Providing high quality content that LLMs will actually cite is the new game in town.