Summary of Key Points
Anthropic, the parent company of the AI company Claude, was collectively sued by authors for using pirated books to train its AI models. The dispute resulted in a settlement of $1.5 billion in damages. However, the focus of the lawsuit later shifted to the exorbitant legal fees (the lawyers initially demanded $300 million, which was later reduced to $187.5 million). The judge ruled that using pirated books to train AI constitutes copyright infringement, whereas using authorized books falls under the category of "fair use" (an analogy similar to how humans read and learn). To comply with regulations, Anthropic launched the "Panama Project": they purchased physical books, removed their spines for scanning, and then destroyed them, using the legitimate text to train their AI models. This case has exposed the gray areas surrounding copyright in AI training and the inherent conflicts in interest distribution.
Why Do AI Companies Resort to Pirated Books for Training?
AI needs high-quality data to generate meaningful content, and books are considered more reliable than online posts. However, acquiring authorized book licenses is a cumbersome process that involves negotiating with publishers, signing contracts, and paying royalties (which Anthropic's CEO described as a "legal and commercial hassle"). Therefore, they took a shortcut:
- They initially used the publicly available pirated dataset "Books3," which contained nearly 200,000 pirated books. When author Andrea Bartz discovered her book in the dataset, she filed a lawsuit.
- Even more notably, Anthropic's co-founder Mann personally participated in the piracy: in 2021, he downloaded over 190,000 books from Books3; later, another pirated website went online, and he downloaded another 5 million books. In 2022, when a new pirated site appeared, he downloaded another 2 million books, encouraging his colleagues to do the same, commenting, "This is really timely!"
In essence, they did it to save time and money, despite knowing the content was illegal.
The Outcome of the Lawsuit: $1.5 Billion in Damages and a Legal Loophole
In 2025, the judge ruled:
- Pirated Training = Infringement: Anthropic's actions of downloading and holding pirated books were not protected by fair use; each download was considered an infringement, and they were ordered to pay $1.5 billion in damages and destroy all pirated materials.
- Authorized Training = Legal: The judge deemed that using legally purchased books to train AI represents a "transformative creative activity," similar to how humans create after reading. Just as people do not have to pay for every quote they use from a book they have purchased, AI should be treated the same.
Anthropic quickly exploited this loophole by implementing the "Panama Project": they spent millions of dollars on purchasing millions of physical books, scanned them into electronic format, and then shredded the paper to obtain legally licensed text for their AI training. The authors were perplexed by this approach, but the judge approved it.
The Most Troublesome Aspect: Legal Fees
Although the $1.5 billion in damages sounds substantial, the amount actually distributed to the authors was minimal—about $3,000 per infringed work, plus a share going to the copyright holders (such as publishers). The lawyers initially demanded 30% of the settlement amount, which they considered "reasonable." This caused dissatisfaction among all parties:
- Authors: They felt that their hard-worked books were being used without proper compensation, and the amount they received was negligible compared to the lawyers' fees.
- Judge and Anthropic: The lawyers failed to provide detailed records of their work hours; why should they receive such a large portion of the settlement?
The lawyers later reduced the fee to 12.5% ($187.5 million), but seven authors still objected. Nevertheless, over 90% of the works agreed to the settlement, and it is likely that the final outcome will not change. This lawsuit, which was initially about "the dignity of creativity," has turned into a dispute over the amount lawyers deserve to be paid.
The Controversy around the Analogy of AI Training as Human Learning
The judge's comparison of AI training to human reading is at the core of the controversy:
- Supporters of tech companies: They see this as a victory for AI, as it allows for the legal use of authorized books and promotes AI development.
- Authors and ethicists: They argue that AI is not human; while humans read to understand and create, AI merely "replicates and reorganizes" text. If AI training is considered "learning," then authors' intellectual property rights are at risk—meaning AI could use books freely as long as it purchases an authorized copy.
This controversy is unlikely to be resolved soon. AI companies need to grow, authors need to protect their rights, and the law must keep up with technological advancements.
The Lesson from This Case
The copyright issues surrounding AI training are more complex than simply determining whether piracy is allowed or not. Anthropic's case highlights several key points:
1. Piracy is clearly unacceptable, but the boundaries of fair use for authorized training are still unclear.
2. Interest distribution must be fair; lawyers should not receive a disproportionate share at the expense of authors.
3. The relationship between AI and human creativity requires clearer legal guidelines.
In the future, AI companies, authors, and the legal community need to have open discussions about whether AI can truly "read" books and how royalties should be allocated accordingly. Otherwise, similar lawsuits will continue to arise.