For art's sake —

Artists may “poison” AI models before Copyright Office can issue guidance

Copyright Office to recommend protections for works used to train AI in 2024.

An image OpenAI created using DALL-E 3.
Enlarge / An image OpenAI created using DALL-E 3.

Artists have spent the past year fighting companies that have been training AI image generators—including popular tools like the impressively photorealistic Midjourney or the ultra-sophisticated DALL-E 3—on their original works without consent or compensation. Now, the United States has promised to finally get serious about addressing their copyright concerns raised by AI, President Joe Biden said in his much-anticipated executive order on AI, which was signed this week.

The US Copyright Office had already been seeking public input on AI concerns over the past few months through a comment period ending on November 15. Biden's executive order has clarified that following this comment period, the Copyright Office will publish the results of its study. And then, within 180 days of that publication—or within 270 days of Biden's order, "whichever comes later"—the Copyright Office's director will consult with Biden to "issue recommendations to the President on potential executive actions relating to copyright and AI."

"The recommendations shall address any copyright and related issues discussed in the United States Copyright Office’s study, including the scope of protection for works produced using AI and the treatment of copyrighted works in AI training," Biden's order said.

That means that potentially within the next six to nine months (or longer), artists may have answers to some of their biggest legal questions, including a clearer understanding of how to protect their works from being used to train AI models.

Currently, artists do not have many options to stop AI image makers—which generate images based on user text prompts—from referencing their works. Even companies like OpenAI, which recently started allowing artists to opt out of having works included in AI training data, only allow artists to opt out of future training data. Artists can't opt out of training data that fuels existing tools because, as OpenAI says:

After AI models have learned from their training data, they no longer have access to the data. The models only retain the concepts that they learned. When someone makes a request of a model, the model generates output based on its understanding of the concepts included in the request. It does not search for or copy content from an existing database.

According to The Atlantic, this opt-out process—which requires artists to submit requests for each artwork and could be too cumbersome for many artists to complete—leaves artists stuck with only the option of protecting new works that "they create from here on out." It seems like it's too late to protect any work "already claimed by the machines" in 2023, The Atlantic warned. And this issue clearly affects a lot of people. A spokesperson told The Atlantic that Stability AI alone has fielded “over 160 million opt-out requests in upcoming training.”

Until federal regulators figure out what rights artists ought to retain as AI technologies rapidly advance, at least one artist—cartoonist and illustrator Sarah Andersen—is advancing a direct copyright infringement claim against Stability AI, maker of Stable Diffusion, another remarkable AI image synthesis tool.

Andersen, whose proposed class action could impact all artists, has about a month to amend her complaint to "plausibly plead that defendants’ AI products allow users to create new works by expressly referencing Andersen’s works by name," if she wants "the inferences" in her complaint "about how and how much of Andersen’s protected content remains in Stable Diffusion or is used by the AI end-products" to "be stronger," a judge recommended.

In other words, under current copyright laws, Andersen will likely struggle to win her legal battle if she fails to show the court which specific copyrighted images were used to train AI models and demonstrate that those models used those specific images to spit out art that looks exactly like hers. Citing specific examples will matter, one legal expert told TechCrunch, because arguing that AI tools mimic styles likely won't work—since "style has proven nearly impossible to shield with copyright."

Andersen's lawyers told Ars that her case is "complex," but they remain confident that she can win, possibly because, as other experts told The Atlantic, she might be able to show that "generative-AI programs can retain a startling amount of information about an image in their training data—sometimes enough to reproduce it almost perfectly." But she could fail if the court decides that using data to train AI models is fair use of artists' works, a legal question that remains unclear.

Channel Ars Technica