Nvidia has been accused of offering to pay for ‘high-speed access’ to Anna’s Archive, a notorious ‘shadow library’ portal, bursting with copyright-infringing materials. Documents published by TorrentFreak appear to show the Nvidia Data Strategy Team reaching out regarding payments for ‘high-speed access’ to Anna’s Archive. Moreover, if the documents are genuine, they indicate that green team management approved the payment plan “within a week.”
Nvidia, like other AI industry giants, is very interested in gaining access to the largest sources of human knowledge to improve LLM training quality. The likes of Meta and Anthropic have previously been found with their fingers all over pirated content. These super-wealthy firms jealously guard their own technologies, so evidence that they seem to have little or no regard for the intellectual property of others would be a source of irony.
TorrentFreak notes that the email snippets it has shared have been precipitated during the discovery phase of an ongoing class action lawsuit where Nvidia is accused of copyright infringement by training its models on content from the Books3 dataset, including copyrighted works taken from pirate site Bibliotik.
In that case, Nvidia is defending its actions under ‘fair use,’ but the new evidence showing Anna’s Archive correspondence looks compelling. In fact, the authors behind the Books3 class action have filed an amended complaint significantly expanding the scope of the lawsuit, says TorrentFreak.
(Image credit: Future)
One of the most damning pieces of correspondence between Nvidia reps and Anna’s Archive is shown above. The snippet appears to show an unnamed Nvidia exec inquiring about the use of Anna’s Archive for LLM training.
Probably worse, though, is the section of the new court filing which alleges that “Within a week of contacting Anna’s Archive, and days after being warned by Anna’s Archive of the illegal nature of their collections, Nvidia management gave ‘the green light’ to proceed with the piracy.”
The proposed deal would mean providing Nvidia with high-speed access to ~500TB of data for LLM training. We don’t see evidence that the deal actually went through, or that any payments went to Anna’s Archive.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter Get Tom's Hardware's best news and in-depth reviews, straight to your inbox. Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors
Anna's Archive service for LLM developers (Image credit: Future)
... continue reading