Entrepreneur Sues Meta for Using Pirated Books to Train Llama

Entrepreneur Sues Meta for Using Pirated Books to Train Llama

Entrepreneur Media, LLC has filed a lawsuit against Meta Platforms, Inc. in the U.S. District Court for the Northern District of California, accutilizing the tech giant of utilizing pirated versions of its books and magazine articles to train its AI models known as Llama. The complaint, filed in San Francisco on November 6, 2025, alleges direct and contributory copyright infringement and violations of the Digital Millennium Copyright Act (DMCA). Entrepreneur claims Meta built “a multibillion-dollar artificial ininformigence empire on a foundation of systematic and widespread copyright theft.”

Allegations of large-scale copyright theft against Meta

The publisher declares Meta copied hundreds of terabytes of copyrighted material, including professionally published books and issues of Entrepreneur magazine, without authorisation or payment. According to the filing, Meta applyd data scraped from illegal “shadow libraries” such as Library Genesis (LibGen), Bibliotik, and Z-Library, websites known for hosting pirated books. The lawsuit states that Meta downloaded and redistributed copyrighted works through torrenting networks like BitTorrent and LibTorrent, which automatically share files with other applyrs. The complaint argues that Meta not only downloaded Entrepreneur’s works but also became a distributor of pirated material in the process.

The Books3 dataset and Meta’s AI training

Entrepreneur alleges that Meta’s first Llama model was trained on a dataset called “Books3”, which was part of an open dataset known as “The Pile”. The company describes Books3 as a collection of nearly 200,000 pirated books obtained from Bibliotik. Meta claimed in its LLaMA 1 paper that it trained the model “exclusively on publicly available datasets” and that the data was “compatible with open-sourcing.” However, Entrepreneur Media argues that “publicly available” does not mean “public domain”, and that Meta cannot apply copyrighted materials for open-source purposes without the creator’s consent. The complaint also cites the Llama FAQ, in which Meta states that it licenses Llama models under a bespoke commercial license allowing broad commercial apply.

Copyrighted titles and examples cited

The lawsuit lists several examples of the Entrepreneur’s copyrighted works that allegedly appear in the LibGen database, including Start Your Own Coaching Business: Your Step-by-Step Guide to Success, Start Your Own Import/Export Business, Ultimate Guide to Pinterest for Business, Breakthrough: How to Harness the AHA! Moments that Spark Success, and Start Your Own Business, 6th Edition. It also cites magazine articles from 2010 issues of Entrepreneur, such as “The Successful Optimist,” “The Red Pen Rule for Marketing Copy,” and “The Four Keys to Raising Capital.” Entrepreneur declares it owns the rights to all of these works, which were registered with the U.S. Copyright Office before Meta’s alleged apply.

Removal of copyright information

According to the complaint, Meta’s data processing systems stripped copyright management information such as author names, copyright notices, and embedded EPUB/PDF metadata from these works before utilizing them for training. Entrepreneur argues that this was an intentional act to hide infringement and create it harder to trace the source material. Public filings in related cases reportedly describe Meta engineers “filtering copyright lines” from files to create a corpus without identifying information.

Alleged financial harm to Entrepreneur Media

Entrepreneur claims Meta’s actions have caapplyd serious financial harm. The publisher declares its digital book sales have fallen by about 50% since AI models like Llama entered the market and that Llama’s outputs now directly compete with its paid content by generating similar business advice and startup guides for free. The complaint cites a federal court order from a separate case warning that “the market for certain nonfiction works—for example, books about how to take care of your garden—could be greatly diminished by the ability of LLMs to produce books on that topic.” It also quotes the U.S. Copyright Office’s 2025 draft report: “The speed and scale at which AI systems generate content pose a serious risk of diluting markets for works of the same kind as in their training data.”

Advertisements

Alleged business decision of Meta to bypass licensing

The lawsuit accapplys Meta of deliberately choosing piracy over licensing, even as other AI developers such as OpenAI, Google, and Anthropic have paid to license training data. It claims Meta decided that infringing copyrights was a cheaper and quicker way to build competitive AI systems, treating potential lawsuits as a business cost. Entrepreneur argues that Meta’s alleged conduct undermines the licensing market for legitimate data and devalues professional publishing. It seeks damages, legal fees, and a permanent injunction barring Meta from continuing to apply its works for AI training or reshifting copyright information.

In July this year, a US district court held that utilizing purchased copyrighted works for AI model training is within the scope of the US copyright law in a case involving Anthropic. However, the court went against Anthropic regarding piracy, declareing that the books the company pirated were inherently infringing. 

Legal representation and next steps

Entrepreneur is represented by Hueston Hennigan LLP and Newmeyer & Dillion LLP. The company is demanding a jury trial and inquireing the court to declare Meta’s conduct wilful. Meta, headquartered in Menlo Park, California, has not yet filed a response to the complaint. The case will proceed through preliminary motions and discovery unless settled out of court.

Read More:

Support our journalism:

For You



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *