BBC threatens AI firm with legal action over unauthorised content use

Chatbots and image generators that can generate content response to simple text or voice prompts in seconds have swelled in popularity since OpenAI launched ChatGPT in late 2022. But their rapid growth and improving capabilities has prompted questions about their use of existing material without permission. Much of the material used to develop generative AI models has been pulled from a massive range of web sources using bots and crawlers, which automatically extract site data. The rise in this activity, known as web scraping, recently prompted British media publishers to join calls by creatives for the UK government to uphold protections around copyrighted content. In response to the BBC's letter, the Professional Publishers Association (PPA) - which represents over 300 media brands - said it was "deeply concerned that AI platforms are currently failing to uphold UK copyright law." It said bots were being used to "illegally scrape publishers' content to train their models without permission or payment." It added: "This practice directly threatens the UK's £4.4 billion publishing industry and the 55,000 people it employs." Many organisations, including the BBC, use a file called "robots.txt" in their website code to try to block bots and automated tools from extracting data en masse for AI. It instructs bots and web crawlers to not access certain pages and material, where present. But compliance with the directive remains voluntary and, according to some reports, bots do not always respect it. The BBC said in its letter that while it disallowed two of Perplexity's crawlers, the company "is clearly not respecting robots.txt". Mr Srinivas denied accusations that its crawlers ignored robots.txt instructions in an interview with Fast Company, external last June. Perplexity also says, external that because it does not build foundation models, it does not use website content for AI model pre-training.

BBC threatens AI firm with legal action over unauthorised content use

Share this article

Related Articles