A lawsuit was filed Monday in California federal court by authors alleging that AI company Anthropic trained its models on pirated books. The plaintiffs accuse Anthropic of misusing hundreds of thousands of books, including those written by the plaintiffs. 

The complaint states, “Anthropic has built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books… An essential component of Anthropic’s business model—and its flagship “Claude” family of large language models (or ”LLMs”)— is the large scale theft of copyrighted works.” The legal action has been brought “under the Copyright Act.”

The plaintiffs accuse Anthropic of using an open-source dataset “The Pile” to train its AI models, which allegedly included a dataset “Books3” which contains almost 200,000 pirated books. Anthropic is facing a similar lawsuit from music labels, citing copyright infringement in its AI music lyric generation. Earlier this year, music businesses joined in a lawsuit against Suno and Udio, accusing the AI companies of unfairly copying their content.

Anthropic has yet to officially address the lawsuit. Their website boasts that they value “Unusually high trust.” They add: “Our company is an unusually high trust environment: we assume good faith, disagree kindly, and prioritize honesty. We expect emotional maturity and intellectual openness. At its best, our trust enables us to make better decisions as an organization than any one of us could as individuals.”

In March, Amazon announced the conclusion of a $4 billion investment into Anthropic. A post-money valuation of the private company finds it to be worth around $15 billion. The company has also seen investments from Stackpoint Ventures, Alliance Global Partners, and Scenic Management.

Add comment