Last month, AI company Anthropic agreed to a blockbuster $1.5 billion settlement after being caught red-handed training its models on an enormous cache of pirated versions of copyrighted books and other material.
Now, a similar lawsuit aimed at ChatGPT maker OpenAI has taken a dramatic turn, raising the possibility of yet another major legal escalation regarding AI-facilitated copyright infringement — and a potentially much bigger payout to rightsholders.
Specifically, authors and publishers who filed a lawsuit against the Sam Altman-led firm have secured access to internal Slack messages and emails discussing the mass deletion of a pirated books dataset, Bloomberg reports. A New York district court ordered OpenAI to hand over the communications regarding data deletion last week.
According to the publication, the communications could demonstrate willful infringement, potentially leading to enhanced damages of up to $150,000 per work, a massive increase from just $750.
For context: Anthropic’s settlement only covered around half a million works out of an estimated seven million, resulting in a payout of around $3,000 per author. Authors and publishers had just gone after similar communications — right before the case settled.
“Finding out what attorneys said or what clients said to attorneys and back and forth probably gives us a lot of evidence regarding state of mind,” Hamline University professor David Schultz told Bloomberg.
The lawsuit highlights the AI industry’s largely careless treatment of copyrighted materials. Tech leaders have continued to argue that training AI models on protected content falls under “fair use,” a legal doctrine that allows for transformative use of copyrighted materials.
Most recently, OpenAI’s TikTok-like text-to-video app Sora 2 has been found to spit out a litany of videos heavily based on protected intellectual property, showing it’s not just ChatGPT potentially infringing copyright. (Sloppily-implemented guardrails have since stemmed the tide somewhat, but users are easily finding ways around them.)
Given the major payouts at stake, the tides could be turning in favor of artists who had their lifework sucked up by AI models without permission, who could be looking at another consolation prize in the form of a settlement fee.
Both Anthropic and OpenAI have been accused of training their AI models on copyrighted materials uploaded to a piracy website called LibGen.
Communications show that OpenAI deleted the dataset, a move that plaintiffs argue could be construed as intentional destruction of evidence. Judge Ona Wang has already found that OpenAI improperly withheld materials, per Bloomberg.
While it remains to be seen how the internal communications will influence the case, experts argue that it could allow plaintiffs to build a stronger case.
“There may be a smoking gun, we don’t know,” Saint Louis University law professor Yvette Joy Liebesman told Bloomberg. But the authors’ lawyers are “going to get as much information as possible to get as much money for plaintiffs as possible.”
OpenAI’s lawyers may have already made a major misstep by at first claiming that the company had deleted the LibGen data “due to… non-use” — only to change their mind, claiming they had misspoken.
If OpenAI were to be found to have intentionally destroyed evidence by deleting copyrighted materials, it could be in deep trouble.
“Juries hear that and it’s a very powerful stick,” Snell & Wilmer LLP Partner Nathan Mammen told Bloomberg. “It doesn’t guarantee the outcome, but it’s a heavy thumb on the scales.”
More on OpenAI: OpenAI’s Sora Is in Serious Trouble