On December 22, local time, a group of writers led by two-time Pulitzer Prize winner John Carreyrou filed a class-action lawsuit in the U.S. District Court for the Northern District of California, naming six AI giants—OpenAI, Google, Meta, Anthropic, xAI, and Perplexity AI—as co-defendants, accusing them of "willful copyright infringement" by training models with pirated books. Carreyrou previously exposed the shocking fraud of Silicon Valley blood testing startup Theranos, and published the book "Blood and Gold" based on it.
The lawsuit alleges that the plaintiffs' core accusations focus on a "dual infringement chain": the six companies downloaded millions of pirated books, including novels and non-fiction works, from "illegal shadow libraries" such as LibGen and Z-Library, and then used these works for training large language models and optimizing products, forming an illegal closed loop of "piracy acquisition - model training - commercial monetization." The plaintiffs emphasize that the authors' intellectual property supports a "billion-dollar AI ecosystem," yet they have received no compensation whatsoever.
If the jury finds the infringement to be intentional, the maximum damages could be $150,000 per infringing work.
This lawsuit is not the first time the AI company has been embroiled in a copyright infringement dispute involving literary works. (According to Nandu Digital Economy) According to a report by the Center for Governance Research, OpenAI is a major target of lawsuits in the industry, facing at least 14 copyright lawsuits.
As early as the end of 2023, The New York Times Lawsuit against Microsoft for copyright infringement OpenAI and others claim that millions of articles published in newspapers are used to train intelligent chatbots. (Such as Microsoft Copilot and ChatGPT). The New York Times argues that the defendants should be held liable for billions of dollars in damages related to the illegal copying and use of The New York Times ' unique and valuable work. The New York Times also demands that the defendants destroy any AI models and training data that use its copyrighted material.
In June of this year, OpenAI issued a statement saying it was appealing a lawsuit filed by The New York Times demanding the indefinite retention of consumer ChatGPT and API customer data. OpenAI argues that this lawsuit fundamentally violates its privacy commitments to users and constitutes an overreach of authority.
OpenAI isn't the only company with copyright disputes with The New York Times. Last October, The New York Times also issued a "stop and terminate" notice to the generative AI startup Perplexity, demanding that the latter cease accessing and using its content.
Google, on the other hand, just received Disney's offer in early December. Meta received a cease and desist letter for allegedly "copying a massive amount of copyrighted work without authorization for AI development." Meta has also received multiple copyright infringement warnings from major Hollywood studios due to issues with its model training data.
Anthropic's copyright infringement case is even more significant. The company was sued for using pirated books to train its Claude model. In June 2025, a California court ruled that "fair use does not apply to pirated data," ultimately forcing Anthropic to pay a $1.5 billion settlement and destroy the infringing data.
Although xAI and Perplexity AI are relatively new companies, the infringement practices they are accused of employing are highly consistent with those of industry giants, exposing the AI companies' reliance on pirated data.
The Northern District of California court, where this case is being heard, has accepted 25 AI copyright cases, accounting for more than half of similar cases nationwide. Its ruling may become a key benchmark for determining the legality of AI training data.
(Article source: Jiemian News)