Skip to content
Tech News
← Back to articles

EFF Tells Publishers: Blocking the Internet Archive Won't Stop AI, But It Will Erase The Historical Record

read original get Internet Archive Subscription → more articles
Why This Matters

The decision by major publishers to block the Internet Archive from crawling their websites threatens to erase decades of digital history, impacting researchers, historians, and the public. While publishers seek control over AI training data, restricting access to archival content risks undermining the preservation of our collective digital memory. This highlights the tension between intellectual property rights and the importance of open access to historical records in the digital age.

Key Takeaways

"Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper," writes EFF senior policy analyst Joe Mullin.

"That's effectively what's begun happening online in the last few months."

The Internet Archive — the world's largest digital library — has preserved newspapers since it went online in the mid-1990s... But in recent months The New York Times began blocking the Archive from crawling its website, using technical measures that go beyond the web's traditional robots.txt rules. That risks cutting off a record that historians and journalists have relied on for decades. Other newspapers, including The Guardian, seem to be following suit...

The Times says the move is driven by concerns about AI companies scraping news content. Publishers seek control over how their work is used, and several — including the Times — are now suing AI companies over whether training models on copyrighted material violates the law. There's a strong case that such training is fair use. Whatever the outcome of those lawsuits, blocking nonprofit archivists is the wrong response.

Organizations like the Internet Archive are not building commercial AI systems. They are preserving a record of our history. Turning off that preservation in an effort to control AI access could essentially torch decades of historical documentation over a fight that libraries like the Archive didn't start, and didn't ask for. If publishers shut the Archive out, they aren't just limiting bots. They're erasing the historical record...

Even if courts place limits on AI training, the law protecting search and web archiving is already well established... There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake.

Read more of this story at Slashdot.