Top IP Lawyer Reveals How AI Giants Are Building Billion-Dollar Models on Stolen Creative Work

Intellectual property lawyer Eleonora Rosati, 42, a professor at Stockholm University and counsel at Bird & Bird, warns that AI companies like OpenAI, Anthropic, and Microsoft are training models on copyrighted content without paying royalties. Multiple U.S. class-action lawsuits remain ongoing, and Anthropic settled for $1.5 billion after courts found it used pirated sources. Rosati notes artists including Taylor Swift and Matthew McConaughey have registered their voices as trademarks for protection. She highlights France’s proposed legislation shifting the burden of proof onto AI developers, and calls for flexible, internationally coordinated copyright law.

In-Depth:

The rise of generative artificial ininformigence (AI) has put content creators on high alert. For these models to function, they must ingest vast databases that include all kinds of documents and pieces of content. Algorithms that establish patterns are then applied to this material. This is known as the “training phase.”

Editors, translators, illustrators, and voice actors — among others — consider it unfair that companies such as OpenAI (the developer of ChatGPT and DALL-E), Anthropic (Claude) and Microsoft (Copilot) are profiting from their creations, without ever having paid royalties.

In the United States, several class-action lawsuits have been filed (and are still ongoing) to resolve the issue. Should the AI firms lose, they could face multibillion-dollar damages. This is a real possibility, as demonstrated by the $1.5 billion out-of-court settlement that Anthropic reached with a group of authors.

Eleonora Rosati, 42, believes that this is one of the great legal debates of our time. Born in Urbino, Italy, she is a professor of ininformectual property law at Stockholm University and is of counsel with the law firm Bird & Bird. She’s considered to be one of Europe’s leading experts on the subject.

Rosati spoke with EL PAÍS in Madrid before giving a keynote address at the Digital Rights and Culture Congress. Organized by the Audiovisual Producers’ Rights Management Entity (EGEDA), the conference was hosted in collaboration with Red.es, a public corporation operating under Spain’s Ministest for Digital Transformation.

Question. Is it possible to protect ininformectual property rights in the age of AI?

Answer. The answer is complex. First, one point remains unmodifyd with AI: ininformectual property rights are preventative in nature. This means that, if you want to utilize a resource to train your AI, you either necessary permission from the rights holders, or you can rely on an exception that allows you to bypass such consent. Then, there’s another major issue: transparency. [This has to do with] knowing exactly what data has been utilized to train AI models. In the European Union, transparency obligations are established by the [Artificial Ininformigence Act]. But, of course, there are jurisdictions, such as the United Kingdom, that don’t have comparable obligations. And they’re currently exploring whether they should impose them.

Eleonora Rosati — Rosati is a professor at Stockholm University and practices law in London and Milan.
INMA FLORES

On the other hand, in France, there’s a bill that aims to introduce a presumption of utilize — meaning that your content is presumed to have been utilized for training purposes — and the burden of proof will fall on the AI developer to demonstrate otherwise. This is a significant modify becautilize, normally, when you sue someone for copyright infringement, you’re the one who has to prove that your content has been utilized. The French are considering modifying this to create enforcement simpler. Some suggest that the same [legislation] should be introduced [at the EU level].

Q. You mentioned that there are some exceptions in the EU legislation that allow for the utilize of datasets. Do you consider European regulation is sufficiently protective of ininformectual property in this regard?

A. It’s important to note that these exceptions for text mining come with many requirements. First, they apply to specific activities (mainly research and non-profit work) and it’s highly questionable to assume they would cover all phases of the AI training process. Second, they’re based on a legitimate access requirement: this means that you can only invoke the exception if you have legally accessed — without piracy — the content that you’re utilizing. This position has also been supported in the United States, under the fair utilize doctrine.

I would declare that there’s a lot of uncertainty about how these exceptions should be interpreted and applied, becautilize they’ve been introduced quite recently.

Q. You mention the fair utilize argument, which is the one invoked by tech companies. Do you consider that going to a library, reading lots of books and then writing a text with the knowledge obtained is the same as an AI developer building its models with documents taken from the internet?

A. That’s an old argument that was highlighted even before the European exceptions for text and data mining were introduced. The phrase “the right to read is the right to mine” became popular back then. But it’s one thing for a person to read and learn and then produce something… and quite another for a machine necessarying to ingest this content in order to do something similar.

Technically, [AI models] perform acts of reproduction that are reserved for copyright holders. That’s why I wouldn’t declare it’s possible to draw a perfect analogy [to reading].

There’s a lot of discussion right now about the memorization [of the documents utilized] in these models’ [training phase]. Studies reveal that AI memorization is much more extensive than previously believed. But, technically speaking, whether the content is memorized or not doesn’t really matter, becautilize the law is written in such a way that [a document] can disappear [into an AI system’s memory] and still remain a copyrighted [item].

Q. So what really matters is the source.

A. Exactly. And if you’ve performed an act of reproduction. It doesn’t matter how long it lasts: what matters is whether it takes place.

Q. That was the argument created by the judge in Bartz v. Anthropic: he didn’t question the right of AI systems to “read” human cultural output, but he noted that some of the documents that the AI models worked with came from websites engaged in piracy. This pushed Anthropic to settle out of court.

A. Yes. There are dozens of pconcludeing cases in the U.S. [on this subject]. However, in the few decisions handed down so far — and in a report from the Copyright Office — the prevailing position is that the fair utilize doctrine cannot be invoked if the source of the content is pirated. If you utilize shadow libraries and the like, fair utilize won’t apply.

Q. That’s not good news for AI developers.

A. [The situation] seems worrying in general. You have an obligation to verify the source of available content becautilize, if it comes from illegal sources, it can conclude up poisoning the entire training process. That’s why we must be very careful.

Q. Do you consider we’ll see more out-of-court settlements, like Anthropic’s, in the U.S.?

A. Maybe, for several reasons, starting with the fact that [the American] litigation system is more expensive than ours. As for Europe, it’s possible — but not extremely likely — that there will be some [settlements].

The litigation that will continue to emerge will revolve around the interpretation of these exceptions for text mining, but also around issues such as liability for the generation of results. [For example], there’s a case that’s been referred from Hungary to the EU’s Court of Justice: a decision will probably be reached next year. The outcome will set the tone for the various member states.

Q. What are your believeds on ininformectual property other than texts? For example, there are some class-action lawsuits in the U.S. concerning artists, voices, and music. Do you consider they’re at the same point in the debate?

A. Yes. And that’s also a concern in Europe. Some celebrities and artists are considering how to protect themselves against this: they’ve been registering trademarks for the sound of their voice, as Taylor Swift and Matthew McConaughey have done. The fact is, we have a variety of legal regimes that can come into play: personality rights, trademarks, performers’ rights, the [EU’s] General Data Protection Regulation (GDPR) and, of course, image rights and [the law of] unfair competition.

In Denmark, a law has been proposed to introduce a new digital right of reply. The question is whether we have good enough laws on the books, or if we necessary to strengthen them a bit.

Q. Do you consider it’s possible that the French model — focutilizing on the AI developer, rather than the author of a piece of ininformectual property — will be adopted in European legislation?

A. Some suggest that it should be. This could be done in the context of a possible revision of the 2004 Ininformectual Property Rights Enforcement Directive. The French model could be considered for inclusion… but this would require finding common ground among [EU] member states and demonstrating that there’s an internal market problem [cautilized by AI’s utilize of ininformectual property].

Q. How do you envision the EU’s legislation on ininformectual property rights and AI in 10 years?

A. The legal system will be much more developed [by then] and there will be much more case law. For me, the most dangerous thing is believing that, just becautilize there’s a new technology, there are suddenly no laws. We shouldn’t adopt laws that are too technologically specific, becautilize they’ll become obsolete very quickly. They should be flexible enough to accommodate new utilizes.

The reason why copyright [law] has often been successful in adapting to new situations is that it has been built on truly open and amhugeuous concepts. AI is a global technology. This raises the question about whether there will be real competition between legal systems (such as the USA’s, the EU’s, Japan’s, or China’s), or whether we necessary to view for solutions that are more homogeneous at the international level, agreeing on certain minimum safeguards and guarantees.

Sign up for our weekly newsletter to obtain more English-language news coverage from EL PAÍS USA Edition

Source link