With web publishers in crisis, a new open standard lets them set the ground rules for AI scrapers. (Or, at least it will try.) The new Really Simple Licensing (RSL) standard creates terms that participants expect AI companies to abide by. Although enforcement is an open question, it can't hurt that some heavy hitters back it. Among others, the list includes Reddit, Yahoo (Engadget's parent company), Medium and People Inc.
RSL adds licensing terms to the robots.txt protocol, the simple file that provides instructions for web crawlers. Supported licensing options include free, attribution, subscription, pay-per-crawl and pay-per-inference. (The latter means AI companies only pay publishers when the content is used to generate a response.)
Launching alongside the standard is a new managing nonprofit, the RSL Collective. It views itself as an equivalent of nonprofits like ASCAP and BMI, which manage music industry royalties. The new group says its standard can "establish fair market prices and strengthen negotiation leverage for all publishers."
Advertisement Advertisement
Advertisement
Participating brands include plenty of internet old-schoolers. Reddit, People Inc., Yahoo, Internet Brands, Ziff Davis, wikiHow, O'Reilly Media, Medium, The Daily Beast, Miso.AI, Raptive, Ranker and Evolve Media are all on board. Former Ask.com CEO Doug Leeds and RSS co-creator Eckart Walther lead the group.
"The RSL Standard gives publishers and platforms a clear, scalable way to set licensing terms in the AI era,” Reddit CEO Steve Huffman wrote in a press release. "The RSL Collective offers a path to do it together. Reddit supports both as important steps toward protecting the open web and the communities that make it thrive." (It's worth noting that Reddit has licensing deals with OpenAI and Google.)
It's unclear whether AI companies will honor the standard. After all, they've been known to simply ignore robots.txt instructions. But the group believes its terms will be legally enforceable.
In an interview with Ars Technica, Leeds pointed to Anthropic's recent $1.5 billion settlement, suggesting "there's real money at stake" for AI companies that don't train "legitimately." (However, that settlement is up in the air after a judge rejected it.) Leeds told The Verge that the standard's collective nature could also help spread legal costs, making challenges to violations more feasible.
Advertisement Advertisement
Advertisement
As for technical enforcement, the RSL standard can't block bots on its own. For that, the group is partnering with the cloud company Fastly, which can act as a sort of gatekeeper. (Perhaps Cloudflare, which recently launched a pay-per-crawl system, could eventually play a part, too.) Leeds said Fastly could serve as "the bouncer at the door to the club."
Leeds suggested to Ars that there are incentives for AI companies, too. Financially, it could be simpler for them than inking individual licensing deals. It could prevent a problem in AI content: using multiple sources for an answer to avoid using too much from any one. If content is legally licensed, the AI app can simply use the best source, which provides the user with a higher-quality answer and minimizes the risk of hallucinations.
He also referenced complaints from AI companies that there's no effective means of licensing web-wide content. "We have listened to them, and what we've heard them say is… we need a new protocol," Leeds told Ars Technica. "With the RSL standard, AI firms get a "scalable way to get all the content" they want, while setting an incentive that they'll only have to pay for the best content that their models actually reference. If they're using it, they pay for it, and if they're not using it, they don't pay for it."