Apple named in AI lawsuit over data set it says doesn’t power Apple Intelligence

Apple is named in a new AI lawsuit by publisher Chicken Soup for the Soul, reports Reuters. However, the lawsuit points to a data set that Apple has already said doesn’t power Apple Intelligence.

Per the Reuters report:

Book publisher Chicken Soup for the Soul sued several Big Tech companies in California federal court late Tuesday for allegedly misusing its content to train their artificial intelligence systems. The publisher said that Apple (AAPL.O), Google (GOOGL.O), Nvidia (NVDA.O), Meta Platforms (META.O), OpenAI, Anthropic, Perplexity ‌AI and Elon Musk’s xAI used pirated copies of its books to teach their chatbots to respond to human prompts.

The lawsuit, which you can read in full here, accuses Apple of using books to train its AI technology:

This case concerns a straightforward and deliberate act of theft that constitutes copyright infringement. Anthropic, Google, OpenAI, Meta, xAI, Apple, Perplexity, and NVIDIA, illegally copied vast quantities of copyrighted books without permission and then used those stolen copies to build and train their commercial large language models (“LLMs”) and/or optimize their product. Defendants helped themselves to the copyrighted works of thousands of authors—including bestselling writers, Pulitzer Prize-winning journalists, and creators of widely read nonfiction and fiction.

Later in the filing, the lawsuit points to The Pile being used to train Apple Foundation Models.

Rather than obtain licenses or pay for the use of these works, each Defendant

downloaded pirated copies of Plaintiff’s books from shadow-library websites such as The Pile, LibGen, Z-Library, and Anna’s Archive and then reproduced, parsed, analyzed, re-copied, used, and embedded those works into their LLMs (and/or used those works to optimize their product) to accelerate commercial development and win the generative-AI race. The Copyright Act prohibits exactly this conduct. […] “Apple Foundation Models” relied upon The Pile and Books 3.

If The Pile rings a bell to you, that’s likely because it surfaced in a different AI training accusation in 2024, involving YouTube videos.

At the time, however, Apple said that the dataset in question was only used for research purposes and not actually used in any models that powered Apple Intelligence or machine learning features.

... continue reading