It could be a consequential act of quiet regulation. Cloudflare, a web infrastructure company, has updated millions of websites' robots.txt files in an effort to force Google to change how it crawls them to fuel its AI products and initiatives.
We spoke with Cloudflare CEO Matthew Prince about what exactly is going on here, why it matters, and what the web might soon look like. But to get into that, we need to cover a little background first.
The new change, which Cloudflare calls its Content Signals Policy, happened after publishers and other companies that depend on web traffic have cried foul over Google's AI Overviews and similar AI answer engines, saying they are sharply cutting those companies' path to revenue because they don't send traffic back to the source of the information.
There have been lawsuits, efforts to kick-start new marketplaces to ensure compensation, and more—but few companies have the kind of leverage Cloudflare does. Its products and services back something close to 20 percent of the web, and thus a significant slice of the websites that show up on search results pages or that fuel large language models.
"Almost every reasonable AI company that's out there is saying, listen, if it's a fair playing field, then we're happy to pay for content," Prince said. "The problem is that all of them are terrified of Google because if Google gets content for free but they all have to pay for it, they are always going to be at an inherent disadvantage."
This is happening because Google is using its dominant position in search to ensure that web publishers allow their content to be used in ways that they might not otherwise want it to.
The changing norms of the web
Since 2023, Google has offered a way for website administrators to opt their content out of use for training Google's large language models, such as Gemini.
However, allowing pages to be indexed by Google's search crawlers and shown in results requires accepting that they'll also be used to generate AI Overviews at the top of results pages through a process called retrieval-augmented generation (RAG).